i Vol X Part I NE D m * MayMoy 


THE ! 
 BRITISH JOURNAL 
OF * 


vC STATISTICAL 
Uc PSYCHOROGY 


S 
m s ^ 
EDITED BY 
Ds CYRIL BURT ? 
) 3 WITH THE ASSISTANCE OF N 4 
CHARLOTTE BANKS AND ALAN STUART 
AND THE FOLLOWING EDITORIAL BOARD 3 
A. C. AITKEN E. A. PEEL — 
Ei M. S. BARTLETT L. S. PENROSE 
$ . W- G. EMMETT J. FRASER ROBERTS 
, M. G, KENDALL A. RODGER 
C D.F. LAWLEY W. STEPHENSON 
j E. S. PEARSON P. E. VERNON 
" 
i Manage Sub-editor J. W. WHITFIELD 
D 
^ 
- 
Printed and Published by 
vy : TAYLOR & FRANCIS LTD: 
'OURT, E » Xon 
( RED, LION C FLEET STREET, LONDON, E.C.4 i 


cr mS HU) Subscription 30s. 6d. per volume (U.S.A $4.50) 


XQ k 
QU NS ) 


PUBLICATIONS OF THE BRITISH PSYCHOLOGICAL SOCIETY 


* 

The British Journal of Statistical Psychology is issued by the British Psychological Society. 
The subscription price per volume is 30s. net for Members of the Society and 30s. 6d? (post free) 
for non-members. The subscription price in the U.S.A. is $4,50 (post free). Members of the 
Society should send their subscriptions to Tue Secretary, British Psychological Society, ‘Tavistock 
House South, Tavistock Square, London, W.C.1: non-members to Messrs. TAYLOR & FrAncis, 
Lrp., 18 Red Lion Court, Fleet Street, London, E.C.4. Until further notice the Journal will be 
issued in two parts each year, in May and November. 


Papers for publication should be sent to Sir Cyrit Burt, 9 Elsworthy Road, London, N.W.3. 
Authors will receive 25 copies of their articles free; extra copies may be purchased provided the 
order is given when the proofs are returned. Contributors should indicate whether they wish 
their reprints to be bound in covers. The printers will state current prices when sending out proofs. 


Books for review should be sent to the Editor, advertisements to the Secretary, British 
Psychological Society. 


The Society also issues quarterly The British Journal of Psychology and The British Journal 
of Medical Psychology: subscriptions, 60s. per annum for either journal, should be sent to the 
Cambridge University Press, Bentley House, Euston Road, London, N.W.1. The Society 
publishes, jointly with the Association of Teachers in Colleges and Departments of Education, 
The British Journal of Educational Psychology: for this the subscription, £1 per annum, should 
be sent to Methuen & Co., Ltd., 36 Essex Street, London, W.C.2. Members of the Society 
receive the foregoing journals on special terms; enquiries should be addressed to the Secretary, 
British Psychological Society. 


Papers for publication in The British Journal of Psychology should be sent to Professor James 
Drever, Department of Psychology, the University, Edinburgh; those for publication in The 
British Journal of Medical Psychology to Dr. Joseph Sandler, 96 Portland Place, London, W.1; 
those for publication in The British Journal of Educational Psychology to Professor P. E. Vernon, 
London University Institute of Education, Malet Street, London, W.C.1. 


PREPARATION OF PAPERS 


. Contributors are asked to send their papers in a form which is ready for submission to the 
printer: that is to say, the arrangement of the material should follow that observed in previous 
publications of this Journal. Each article should be headed either with a series of numbered 
headings in italic, corresponding with the cross-headings of the several sections, or with a brief 
abstract: (the latter procedure is preferred). Each section should have a short cross-heading, in 
capitals; and the whole should conclude with a summary embodying the main conclusions reached. 
Special attention should be paid to such details as correct spelling, grammar, capitalization, 
numbering, etc. (particularly in the case of tables and references). In preparing manuscripts and 
in correcting proofs authors should conform, so far as possible, with the ‘ Recommendations to 
Authors’ set out in The Printing of Mathematics by T. W. Chaundy, P. R. Barrett, and Charles 
Batey (Oxford University Press, 1954, ch. II, pp. 21-73), or to those given in the paper on ' Setting 
Mathematics’ (Arthur Phillips, Monotype Recorder, XL, iv, 1956). 


Owing to the present conditions for printing, the cost of setting up tables and algebraic 
formulae which require hand composition is very Buch greater than the cost of machine composition. 
Contributors are therefore advised to arrange their material so that it can be printed as economically 
as possible. Often numerical results and algebraic formulae can, with a little simplification, be 
readily adapted for purposes of machine composition. Manuscripts involving many tables and 


LO ques €quations canas a rulevonly; bë accepted if the authors are willing to pay for the additional 
cost. m 


In future authors will receive two com; i i of. 
3 plete slip-proofs of their papers, but no page pro 
They will be expected to pay the cost of all Corrections for which they themeelves are responsible. 


* 
Vol. X The British Journal of Statistical Psychology May 


Part I e 1957 


A STUDY OF THE PERFORMANCE OF 2,000 CHILDREN et 
i ON FOUR VOCABULARY TESTS 7 


* 
| I. NORMS, WITH SOME OBSERVATIONS ON THE RELATIVE VARIABILITY, ^ 
9 or Boys AND GIRLS F 


By M. I. Dunspon and J. A. Fraser ROBERTS 


Burden Mental Research Department, Stoke Park Hospital, Bristol 70 nyaan 


I. Introduction. II. Increment of Mean Score with Age. III. Increment of 
Variance of Score with Age. IV. Norms. V. Efficiency of Adjusted Scores. 
VI. Frequency Distribution of Vocabulary Score. VII. Further Examination of 
Variability of Score in Boys and Girls. VIII. Summary. 


I. INTRODUCTION 


Reasons have been given (Dunsdon and Roberts [1]) for choosing vocabulary 
scales as quickly-given individual tests, for the purpose, not of assessing single 
children, but of comparing large groups. For the comparison of groups 
vocabulary tests have manifest advantages in respect of reliability, validity, 
relative absence of practice effect, and above all, the amount of information 
obtained per unit of testing time. The four oral definition vocabularies used 
here are those included in the Terman-Merrill (1937 Revision) [8] Form L, | 
and Wechsler's Intelligence Scale for Children [9], together with Raven's Mill Hill 
Vocabulary Tests A and B [4]. In the form for oral definition the Mill Hill 
vocabularies were designed by Raven to be used in combination. Accordingly 
in this paper results and norms are given in terms of the combined scores for 
M.H.A. and M.H.B. Norms are therefore presented, by sexes, for 

1. Terman-Merrill Intelligence Scale, Form L (T.M.L.); 

2. Wechsler Intelligence Scale for Children (W.I.S.C.); 

3. Mill Hill A and Mill Hill B Vocabulary Scales combined (M.H.A. and 
| M.H.B.); 

4. All four vocabularies combined. 

The reason for calculating norms is to derive scores which are independent 
The requirements for assessing individuals on the one hand and 
| comparing large groups on the other are very different. For the latter a much 
higher standard of accuracy is needed. Relatively small systematic sources 
of bias may matter little in an individual measurement, but are important when 
the problem is (to quote a concrete example) whether twins are or are not slightly 
lower in average performance than their sibs, who are necessarily of different age. 
The point is well illustrated in a recent valuable review article by Yates [10], 


= —————— 


of age and sex. 


dealing with the use of vocabulary in the measurement of intellectual oration. 
S.P. | Bureau cf Ean! & Psyl. Research ndi" search | 4 
AN. © agent \ 


(5. G; E RV ) 


2 M. I. Dunsdon and J. A. Fraser Roberts 


He mentions the higher average score of boys, quoting our study and others; 
and then adds: “The differences were so slight as to be unimportant”. Actually, 
on taking the four vocabularies combined, at 6 years boys have a mean score of 
26 words and girls 23; at 14 years boys score 76 and girls 71. In terms of ‘mental 
ages’ these are differences of about 5 months in young children and nearly a year 
in children approaching 15. Such differences may be unimportant in relation 
to the special problem Yates discusses; but they are of the order of magnitude 
of some of the differences between large groups of subjects that we had in mind 


Similar considerations apply to standardization for differences in age. 
Rough norms are not enough. Only if it can be shown that the adjustments 
do in fact yield scores which are independent, or almost independent, of differences 
in age and sex, can groups be compared which differ in age and sex composition. 
Matching becomes necessary, the very complication that norms are designed to 
- avoid. We have tried to adopt a procedure which uses as much as possible 
of the information provided by the sample; and we feel that, in view of the 
Purpose of the work, this needs no apology. In any case, more accurate norms, 
though much more troublesome to compute, are easy enough to use. 

~The first step in the procedure which we have adopted is to measure by how 
much the number of words correctly defined differs from the number expected 
for that sex and that age, the expected number being calculated from the fitted 
growth curves. A second step is then required. Variability also increases with 
advancing age; hence it is necessary to adjust the deviations from the expected 
Scores so that these are made equivalent for children of different ages and inciden- 
tally for children of different sex. This can be done by multiplying the deviation 
at any observed age by the ratio of the standard deviation at that age to some 
fixed standard deviation. The method is perhaps a little cumbersome; but it 
has worked well with previous material, and, as will be shown, it also works well 
in the present instance. Nor does it seem likely that any rational transformation 
of the unit of measurement could make possible a simpler and more direct 
approach, 

A sex difference in the variability of scores for intelligence tests has often 
been observed, and raises questions which have frequently been discussed, 
though certainly never settled. Thanks to its wide range of age, the present 
sample provides useful material for a further examination of this problem. The 
results will be Presented in Section VIT. 


Il. INCREMENT of MEAN SCORE WITH AGE 


In our previous paper [2] linear equations for increment of score with 


advancing C.A. were given by sexes for each vocabulary separately and for the 
four combined, The fit was good, wi 


A Study of Four Vocabulary Tests 3 


W.I.S.C. (which in both sexes showed a decreasing rate of increment) and for 
Mill Hill A (which showed the same phenomenon, though, curiously enough, 
in girls only). Quadratics, however, gave an excellent fit in all three instances. 
Although the other remainders were not significant, nevertheless with M.H.A. 
for boys, M.H.B. for girls and with all four vocabularies combined for girls, 
quadratic equations improved the fit, the variance ratio for the addition of the 
quadratic term being significant in all three instances. Where a quadratic is 
used for one sex, it is desirable to use a second-order curve for the other sex as 
well, even though the improvement may be trivial; otherwise the sex difference 
at the extremes of the range of age may be exaggerated. For this reason, and 
for the sake of uniformity, quadratics have been used throughout. The 
importance of securing a good fit for the growth curves of the means, and the 
substantial improvement effected in some instances by the use of quadratics, 
is well brought out in the final testing of the norms by means of contingency 
tables, as described in Section V. 


Taste I. W.LS.C. VOCABULARY 
Quadratic Curves for Increment of Score. Analysis of Variance 


Variation of D.of F. Sum of Squares Mean Square Variance Signi- 
Vocabulary Score (words)? (words)? Ratio  ficance 
Bovs 
Linear regression 1 15,858:3 15,858:3 
Addition of quadratic term 1 90-6 90-6 7:95 ++ 
Remainder between arrays 17 288.0 16-94 1:50 — 
Within arrays 960 10,839-3 11:29 
Total 979 27,076-2 
GIRLS 
Linear regression 1 14,086-3 14,086:3 
Addition of quadratic term 1 206-6 206-6 23:34 +4 
Remainder between arrays 17 101-5 5-97 0:67 m 
Within arrays 947 8,427-0 8-90 
Total 966 22,821-4 


The most important discrepancy to be corrected occurred with W.I.S.C.; 
the result of fitting quadratics is shown in Table I. Although M.H.A. is not 
used alone, mention may be made of the disappearance of the previously highly 
significant remainder shown by the girls. The variance ratio for the difference 
between the linear and curvilinear regressions, with degrees of freedom 1 and 
964, is 18:63, and the variance ratio for the remainder, with degrees of freedom 
17 and 947, is 1:13, as compared with the previous high figure of 2-10. The 
fitting ofthe growth curves for M.H.A. and M.H.B. combined is shown in Table II. 


A2 


4 M. I. Dunsdon and J. A. Fraser Roberts 


"TABLE II. M.H.A.4- M.H.B., VOCABULARIES 
Quadratic Curves for Increment of Score. Analysis of Variance 


Variation of D. of F. Sum of Squares Mean Square Variance Signi- 
Vocabulary Score (words)? (words)? Ratio ficance 
Boys 
Linear regression 1 77,810-3 
Addition of quadratic term 1i 135-2 135-2 3:23 = 
Remainder between arrays 17 906-0 53-29 1:28 
Within arrays 960 39,996-3 41:66 
"Total 979 118,847-8 
GIRLS 
Linear regression 1 76,542-3 
Addition of quadratic term 1 394-2 394.2 12-10 tt 
Remainder between arrays 17 531-5 31:27 0-96 — 
Within arrays 947 30,875°8 32-60 
Total 966 108,343 -8 


The improvement with the four vocabularies combined is rather small. 
But the quadratic term is significant with the girls; and the variance ratio for 
the remainder is reduced from 1-20 to 0-86, 

The regression equations used for calculating expected scores are as follows : 


T.M.L. Boys: y=— 0-95 +0-0855267x 4 000021691822, 
Girls: y- — 1.82 +0-0976822. 4. 0-000124580x2, 
W.I.S.C. Boys: y- — 1.624 0. 189335x — 0-000291713x2, 
Girls: y= — 3-694 0-215646x — 0-0004394745. 
M.H.A. +B Boys: y= —14-18 +0-350433x — 0-0003561 16.2, 


Girls: y= —18-014-0-404122x. — 0-000607071.2, 
4 Combined Boys: y= — 16-764 0:625292% — 0-000430884.x2, 
Girls: y= — 23-52 + 0-717450x — 0-000921966x2, 


Where x— C.A. in completed months and J/'— score in words. 


III. INCREMENT of VARIANCE OF SCORE WITH AGE 


The expected scores by age and sex having been obtained, the next step is 
to calculate multipliers for making deviations from expectation comparable over 


C.A. with as much accuracy as possible curves are fitted for the regression on C.A. 
of the variance of scores. This procedure was originally used by one of us in 
connection with norms for the Advanced Otis Scale (Roberts, Norman and 
Griffiths [5]), and has since been explained more fully (Roberts and Mellone [6]: 


——__~—M— 


A Study of Four Vocabulary Tests 5 


Roberts [7]). It is a device that can be applied to any measurements when the 
range of age is great enough to yield a considerable increase in variance, and when 
no simple transformation of the units of measurement seems likely to provide 
a better alternative. For example, it has worked well for measurements of 
arterial blood pressure (Hamilton, Pickering, Roberts and Sowry [3]). . 

It was found that cubics are required. In four of the eight comparisons 
the addition of the cubic term effected significant improvement. What is more 
important is that straight lines or quadratics give expected variances for the 
youngest children which are far too low or even negative; this is shown clearly 
by Fig. 1. 


400- 
BOYS  m— a 
GIRLS o--- 
300} 
“a 
a 
a 
o 
= 
1 200} 
w 
o 
z 
< 
« 
x 
> 


1oo- 


1 L L 1 L 
5+ 10+ 14+ 


oL 1 1 r 1 


AGE-YEARS 
Fic. 1. Four vocabularies combined. 


Increment of variance of score with age. Means of year age groups and fitted curves, 


"TABLE III. REGRESSION OF VARIANCE OF Score ON AGE 
Significance of Remainders after Fitting Cubics 


Boys GIRLS 
Vocabulary Variance Ratio Significance Variance Ratio Significance 
T.M.L. 1:72 EDS 2-44 ++ 
W.LS.C. 2-02 JeF 1:36 = 
M.H.A.+M.H.B. 1:94 a 0-61 — 
4 combined 1-84 + 0-50 = 


Degrees of freedom : 16 and co throughout. 


6 M. I. Dunsdon and J. A. Fraser Roberts 


A summary of the results is given in Table III. With the girls the fit is- 3 
excellent, except for T.M.L. With the boys the fit is not so good, all the 
remainders being significant, though not very large. It should be mentioned, 
however, that the final adjusted scores are a good deal less sensitive to changes 
in expected variances than they are to changes in expected means. Fig. 1, 
which shows the observed means of arrays and the fitted curves for all four 
vocabularies combined, brings out the closeness of the fit with the girls and 
some irregularity with the older boys. These features are also true of the 


| 
| 
| 
| 
individual vocabularies taken separately, | 
TABLE IV. Four VOCABULARIES COMBINED 
Increment of Variance of Score with C.A. Analysis of Variance 
Variation of ^ D. of Sum of Mean Square Theoretical Variance Signi- 
Variance due to F. Squares (words)? Variance Ratio ficance 
(words)! (D. of F.— co) 
Bovs 
Linear Regression — 1 11,508,784 11,508,784 58,342 197-26 FP , 
Addition of 
quadratic term 1 40,055 40,055 93,885 0:43 ex 
Addition of 
cubic term 1 388,507 388,507 94,348 442 T 
Remainder 16 2,832,823 177,051 96,023 1:84 iS 
"Total 19 14,770,169 
GIRLS 
Linear Regression 1 5,272,883 5,272,883 33,194 158-85 ++ 
Addition of 
quadratic term 1 71,523 71,523 50,037 1-43 = 
Addition of 
cubic term 1 103,369 103,369 50,537 2-05 = 
Remainder 16 408,560 25,535 50,695 0-50 = 
Total 19 5,856,335 li 


Table IV gives, as an example, the analysis of variance for the 4 vocabularies 


combined. The unit of grouping for C.A. is 6 months. The other analyses were 
fairly comparable. 


The regression equations are as follows: 


T.M.L. Boys: y—100-88— 3.02508x + 0-0286268x2 —0-000077898143, 
Girls: y= 95.31— 2.68802x + 0-0239035x? —0-0000612672x3, 
W.I.S.C. Boys: y= 60-88— 1-73925x 4-0-01668654? — 0-000046401933, 
Girls: y= 15.23. 0-358567« + 0-00380813x2—0-0000102841%°, 
M.H.A.+B Boys: ¥=108-20— 3.28393x +0-0335929x2 — 0-00008906772?, 
Girls: y= 2-85— 0-126640x + 0-00404109.2— 0000007045772, 
4 Combined Boys: y=751-17— 22-7530x +0-223352x2 — 0-0006093573?, 
Girls: y —252.34 _ 7-28021x + 0-0745683.x2 — 00001857673, 
where x —age in completed months and Y is variance of score in (words)?. 


* ucc A o ——— — 


A Study of Four Vocabulary Tests 7 


IV. Norms 


Each of the tables printed in the Appendix is divided into two parts. The 
first gives for each vocabulary, or group of vocabularies, C.A. in years and months, 
with the corresponding number of words; from these are read off the number 
of words by which the observed score exceeds or falls short of expectation. The 
second gives the multipliers which apply to successive 6-month age-groups. 
The observed deviation, plus or minus, is multiplied by the appropriate multiplier ; 
and the final adjusted scores are thus made comparable for children of different 
ages and opposite sex. 

The choice of the fixed standard deviations to which adjustment is made is 
explained in the notes appended to the tables of norms. All have a direct relation 
to the deviation from expectation in number of words as at some particular age 
within the range of the sample. We ourselves are using, with one exception, 
the plus or minus scores just as they stand. They have, of course, a mean of 0 for 
our sample. With the 4 vocabularies combined, however, a standard deviation of 15 
is within the age range; soin thisinstance weadd 100, thereby securing a frequency 
distribution with the conventional mean of 100 and standard deviation of 15. 

One point in the calculation of the multipliers needs explanation. Some of 
the cubics are rather sharply inflected near the lower end of the range, which 
means that, if multipliers were given for cach 6-month age-group, they would 
first of all increase rather rapidly from the group 5:0-5:5 before decreasing again. 
These tails are clearly rather unstable; so it seemed best to average the variances 
for successive groups, starting with the youngest, until the next older group 
showed a decrease from those younger groups so far combined. ‘The six-month 
age-groups for multipliers may be thought rather coarse. With younger children 
the values are changing rapidly; with the 4 vocabularies combined the largest 
difference is 0-21, as the multiplier is 1-94 for boys of 7-0-7-5, and 1-73 for boys of 
7.6 7.11. When the observed C.A. lies towards the end of a 6-month group, 
interpolation can be used for greater accuracy, though it seems hardly necessary. 
In the whole of the sample of 1,947 children, there are only 4 instances where 
interpolation gives a difference of as much as 2 points of score. To give an 
example, a girl aged 7-11 defined 66 words, 30 words above expectation; the 
application of the multiplier gives her an adjusted score of 4-54; and this is 
merely reduced to +52 on interpolation. 


V. EFFICIENCY OF ADJUSTED SCORES 


It is desirable to see whether the adjusted scores are indeed achieving their 
purpose, that is, whether in fact they are independent of age and sex. Average 
allowance has been made for mean and variance; but there may be non-normal 
variations, for example skewness, which are different at different ages or in the 
two sexes. Accordingly, adjusted scores have been worked out for the 1,947 
children making up the sample. Taking the sex comparisons first, the results 
are shown in Table V; no difference is significant; so that the adjusted scores 
are comparable for boys and girls. 


8 M. I. Dunsdon and J. A. Fraser Roberts 


TABLE V. ADJUSTED Scores. (COMPARISON or SEXES 


Mean S.D. 
Vocabulary 
Boys Girls Boys Girls 
T.M.L. +0-030 0-000 3-533 3-449 
W.LS.C. —0-011 +0-021 3-002 3-007 
M.H.A.+M.H.B. — 0-045 — 0-051 5-803 5:716 
4 Vocabularies -- 0-006 — 0-148 15-034 14-920 


All differences non-significant 


Independence of C.A. can be tested by forming contingency tables. In 
view of the results of Table V the scores of boys and girls can be added. The 
units of grouping are two years of age and 2 words, except for the 4 vocabularies 
combined, in which 5-word grouping is used. The results are shown in Table VI. 


TABLE VI. Tests or INDEPENDENCE OF ADJUSTED SCORE AND C.A.: SUMMARY 


Chi 
Vocabulary D. of F. Squared P 
T.M.L. 28 78-77 <0-00001 
W.LS.C. 28 44-23 0-03 
M.H.A.--M.H.B. 52 54-23 0-39 
4 Vocabularies 
Combined 52 52-13 0-47 


It will be seen that M.H.A. + M.H.B. and the 4 vocabularies combined show 
almost perfect independence of C.A. and score. With W.I.S.C. chi-squared 
is significant, but not highly so. With T.M.L., on the other hand, there is a 
large and highly significant departure from independence. If the calculation 
is repeated, however, leaving out the youngest 2-year group, chi-squared falls 
to a value below the 5 per cent level of significance. The departure from 
independence, therefore, does not seem serious, as no one would want to use in 
isolation a single vocabulary of this kind for very young children. 

As might be expected, the substitution of quadratic for linear growth curves 
for the means makes practically no difference when the improvement is small. 
Contingency tables based on the linear equations for T.M.L. and for the 
4 vocabularies combined give values for chi-squared which are closely similar 
to those of Table VI. On the other hand, with M.H.A. and M.H.B. there was 
an appreciable falling off in rate of increment, which leads to a sharp increase 
in chi-squared from the 54-23 shown in Table VI to 82-24 if linear curves are used. 
No doubt W.L.S.C. would show the same effect; but we have not made this 
calculation. This illustrates the importance of securing a good fit and the 
inefficiency that would result from using approximate straight lines when a 
growth curve is appreciably curvilinear. 


A Study of Four Vocabulary Tests 9 


VI. FREQUENCY DISTRIBUTION OF VOCABULARY SCORE 


With any intelligence scale designed to grade the whole population it is 
desirable that the scores should have an approximately normal frequency 
distribution. "The adjusted scores for the 4 vocabularies combined give a single 
frequency distribution for the whole sample; they are no more than deviations 
from the regression line, further adjusted to equalize for differing variances at 
different ages and for different sexes. Departures from normality, such as 
skewness, will not be affected. Smoothed frequencies, with single words of 
score, as described in the preceding section must be used; and, although this 
may somewhat affect the degrees of freedom, the comparison should be broadly 


admissible. 


| 
= T T | T T 
“36 -2f -is o t6 426 — 536 


Fic. 2. Four vocabularies combined. 
Frequency distribution by 5 point groups of age, and sex adjusted scores, with fitted normal 
curve. 


Fig. 2 shows the observed frequencies in 5-word groups, together with the 
fitted normal curve. Testing for normality by means of chi-squared, the central 
16 5-word groups can be treated separately ; the tails below — 40 words and above 
+39 words, very nearly +2%c, must be lumped together to secure sufficient 
numbers. ‘There are thus 18 arrays and, subject to the reservation made above 
about smoothing, 15 degrees of freedom. Chi-squared is 28-53, corresponding 
to a p of very nearly 0-02. The frequency distribution is thus approximately 
normal, and it therefore appears that scores on well constructed oral definition 
tests are as normally distributed as can reasonably be expected. The biggest 
contribution to chi-squared, 8-19, is due to an excess of scores above +39. The 
observed number is 16-75 against an expectation of 8-438. This is a feature of 
many intelligence tests; it is difficult to design a scale which does not allow a few 
very bright children to secure unduly high scores. The group +15 to +19 


10 M. I. Dunsdon and J. A. Fraser Roberts 


shows a deficiency; but the general feature that stands out in Fig. 2 is a slight 
tendency for the frequency curve to be flat-topped at about and rather below the 


mean. This phenomenon may be real, as it not infrequently appears with scales 
of different kinds. 


VIL. FURTHER EXAMINATION OF VARIABILITY OF Score iN Boys AND GIRLS 

The question of the relative variabilities of boys and girls in intelligence 
test scores has attracted interest over many years. The present sample has a 
wide range of age; and so is useful for examining the problem. The changing 
pattern as age increases is shown in Table VII. The means and standard 


TABLE VII. VARIABILITY OF SCORE 


Age Mean S.D. Q:Q. V. 95 Ratio. Boys: Girls 
(Years) Boys Girls Boys Girls Boys Girls Mean S.D. C.O.V. 
54- 2277 205 73 64 32.0 29-7 140 1:19 1-08 
6+ 29.5 27-0 65 77 21:9 28:6 1:009 0:84 0-76 
7+ 349 316 82 86 23.5 272 1:10 0:95 0-86 
8+ 42:8 39:7 114 10-1 26:6 254 1:08 143 1:05 
9+ 48:0 45-5 114 96 23-8 21-0 1:056 120 113 
10+ 57.0 53-5 15-9 12-7 279 23-8 1:07 125 1417 
114- 58:6 574 162 132 27-7 234 1:00 1:223 1:20 
124- 677 642 16-6 142 24-5 224 1:00 147 1411 
13+ 73-8 66-5 1934 15:8 25-8 23-8 141 120 1-08 
144- 777 734 18:2 162 23:4 224 1:06 132 1:06 


All figures based on raw scores. 


deviations are based on the simple raw scores by year age-groups for all four 
vocabularies combined. As was shown in the previous paper, boys have a higher 
mean score at all ages, though, as the eighth column shows, the relative excess 
tends to decline. With 6 and 7-year-old children the standard deviation is higher 
in girls, but thereafter is considerably higher in boys. The coefficient of variation 
gives the clearest picture. The ratios shown in the last column reveal a clear 
trend. In the youngest age-group, 5-0-5-11, boys have the higher C.O.V.; 
but it would be unsafe to deduce too much from tests given to such young children, 
In the seventh year girls are much more variable. At about 8 or 9 years equality 
is reached ; and thereafter the variability of boys relative to that of girls increases 
steadily to about 11 or 12 years; it then declines just as steadily to age 15, 

The ratios of the C.O.V.'s are shown in Fig. 3, where the figures of the last 
column of Table VII have been plotted. A continuous line is shown which 
represents the ratio of the C.0.V.’s when calculated from the curves fitted for 
Increment with age of mean and variance. The trends are smoothed; but in 


fact little is added to the clear evidence of the figures based on the simple raw 
scores. 


A Study of Four Vocabulary Tests 11 


It is tempting to conclude that, in relative variability, vocabulary tests may 
be typical of intelligence scales in general. If so, the greater variability of boys 
may be a passing phase in development, absent in the young child, reaching 
a maximum at 11 or 12 years, and perhaps (though this is an extrapolation) 
disappearing by the time adult life is reached. The phenomenon of greater male 
variability is particularly striking, because most of the evidence comes from group 
tests administered at about the age when the difference is maximal. All this is - 
somewhat speculative; but we hope that our data make some contribution, 
and that others will provide further evidence based on other intelligence scales. 
The problem is important. If indeed the very real greater male variability is 
a passing phase, then allowance is rightly made by scoring boys and girls separately 
in scholarship examinations, and basing grammar school places, for example, 
on an equal proportion of each sex, and not on a dividing line of score applied to 
both boys and girls. 


C.OV-RATIO BOYS GIRLS 


i " " set n " n " 4 
5 7 9 L3 13 15 


AGE-YEARS 


Fıc. 3. Four vocabularies combined. 
Ratio of coefficients of variation (boys/girls). Circles based on raw scores by year age 
groups ; continuous line based on fitted curves for increment of mean and variance 
with age. 


VIII. Summary 

1. A 3 per cent sample of the school children of the City and County of 
Bristol was selected by visiting all the schools, as well as a number outside, and 
testing children between the ages of 5:0 to 14-11 years whose homes were within 
the city and whose birthdays fell on the first day of any calendar month. Four 
vocabulary scales were used: that from the Terman-Merrill Scale (Form L), 
Mill Hill Vocabularies A and B, and that from the Wechsler Intelligence Scale 
for Children. 

2. The present paper gives norms for the vocabularies from the 
Terman—Merrill and Wechsler Intelligence Scales, the oral definitions form 
of Mill Hill Vocabulary Scales A and B, and for all four vocabularies combined. 
In view of the sex difference, norms are given separately for boys and girls, 


12 M. I. Dunsdon and J. A. Fraser Roberts 


3. Curves have been fitted for the increment of score with advancing age; 
and tables are given from which deviations from expectation can be read off. 

4. Further curves have been fitted for the increment of variance with 
advancingage. Deviations from expectation can be made comparable for children 
of different ages by multiplying by the ratio of the standard deviation at observed 
age to a fixed standard deviation. Tables of multipliers are given for making 
this further adjustment, which also makes scores comparable for children of 
opposite sex. 

5. It is shown that for this sample the adjusted scores are, with small 
exceptions, independent of age at test, and that they equalize for children of 
opposite sex. 

6. The frequency distribution is substantially normal. 

7. In terms of the coefficient of variation girls were more variable than boys 
in the younger age-groups. At about 8 or 9 years equality is reached; and 
thereafter boys become progressively more variable than girls up to the age of 


1l or 12. Beyond this age the difference between the sexes is progressively 
reduced. 


ACKNOWLEDGMENTS 


We are most grateful to Professor Lewis M. Terman and to the Houghton 
Mifflin Company for permission to use the Vocabulary test from the Terman- 
Merrill Intelligence Scale [8]; to Mr. J. C. Raven [4] for permission to use the 
Oral Definitions Form of his Mill Hill Vocabulary Scale; and to Dr. David 
Wechsler [9] for permission to use the Vocabulary from his Intelligence Scale for 
Children. Special thanks are due to the Bristol Education Committee and to 
the Chief Education Officer, Mr. G. H. Sylvester, who gave permission for work 
to be carried out in their schools, and in particular to the Headmasters and 
Headmistresses both of the Local Education Authority's schools and of 
independent schools, for their willing and active co-operation. 


APPENDIX 


Notes to Table VIII 

1. To obtain scores adjusted for differences in C.A. and sex, read off expected number 
of words for observed C.A. Multiply the difference (plus or minus) between expected and 
obtained number of words by multiplier corresponding to observed C.A. 

2. For 4 vocabularies, combined scores are adjusted to a standard deviation of 15, 
which is that shown by boys of about 10 years 8 months and by girls of about 13 years 
0 months. Mean of zero can be changed to 100 by adding 100 algebraically to the adjusted 
Score. 

3. For M.H.A. + M.H.B. scores are adjusted to a standard deviation of 57151, which 
is the root mean variance of girls in the experimental sample, and corresponds to a C.A. of 
about 9 years 0 months for boys and about 9 years 9 months for girls. 

4. For T. M.L. the scores are adjusted to a standard deviation of 3:4357, which is the 
root mean variance of girls as before. 


5. For W.LS.C. the scores are adjusted to a standard deviation of 29851, which is the 
root mean variance of girls as before. 


A Study of Four Vocabulary Tests 


Taste VIII. Norms ; EXPECTED SCORES AND MULTIPLIERS 


T.M.L. 
GA 
Boys Girls 
5-0 
50-54 541-58 
5:5 — 64 5-9 — 6:5 
6:2 - 6:9 66-72 
6:10- 7:5 73-740 
76-81 7:11- 8:6 
8:2 - 8:9 87-92 
8-10- 9-5 9:3 — 910 
9-6 -10-0 9-11-10-6 
10-1 -10-7 10-7 -11-1 
10:8 -11:2 11:2 -11:9 
11:3 -11:9 11:10-12-5 
11:10-12:4 12-6 -13-0 
12:5 -12-10 134 -13:7 
12:11-13:5 13:8 -14-2 
13:6 -13-11 14-3 -14:9 
14-0 -14:6 14-10-14-11 
147 -14-11 
C.A. 
5:0- 5:5 
5-6- 5-11 
6:0- 6:5 
6-6- 6:11 
70- 7:5 
7:6- 741 
8-0- 8:5 
8-6- 8-11 
9:0- 9-5 
9-6- 9-11 
10:0-10:5 
10-6-10-11 
11:0-11:5 
11-6-11-11 
12-0-12°5 
12:6-12-11 
13:0-13:5 
13-6-13-11 
14-0-14-5 
14-6-14-11 


W.LS.C. 
Exp'd 
No. of C.A. 
Words Boys Girls 
5-0- 5-5 
4 5-0- 5:5 5-6- 5-11 
5 5-6 — 6:0 6:0- 6:6 
6 631 - 66 6:7- 74 
7 6-7 — 74 7:2- 78 
8 72-79 7:9- 8-4 
9 7:10- 8:4 8:5- 9-0 
10 8-5 - 9-0 9-1- 9:8 
11 9311-98 9-9- 10:5 
12 9-9 -10-5 10:6-11:3 
13 10-6 -11-1 11:4-12-2 
14 11:2 -11:11 12-3-13-3 
15 12:0 -12:8 134-14-5 
16 12:9 -13-7 14-6-14-11 
17 13:8 -14:6 
18 14-7 -14-11 
19 
20 
21 
MULTIPLIERS 
T.M.L. W.I.S.C. 
Boys Girls Boys Girls 
1-31 
i 1:31 
203 2-00 131 1-28 
1-25 
1-99 1:30 1:21 
1:65 2-00 1:21 1:17 
1:38 1-74 1:12 143 
118 1:46 1:03 1:08 
1:03 1:26 0-95 1:04 
0-92 1:10 0-89 1:00 
0:84 0-99 0-83 0:97 
0-78 0-90 0-79 0-93 
0-73 0-83 0-75 0-91 
0:69 0-78 0-72 0:88 
0-66 0-73 0-70 0-86 
0-64 0-70 0-68 0-84 
0-62 0-68 0:67 0-82 
0-62 0-66 0-67 0-81 
0:62 0-65 0-68 0-80 
0-62 0-64 0-69 0-80 


13 


14 


C.A. 


Boys 


9:8 — 9-11 
10-0 -10-2 
10:3 -10-6 


C.A. 
5:0-5-5 
5:6-5-11 
6-0-6-5 
6:6-6-11 
7:0-7-5 


7-6-7-11 
8-0-8-5 
8-6-8-11 
9-0-9-5 
9-6-9-11 


M. I. Dunsdon and J. A. Fraser Roberts 


10:3 -10-6 
10:7 -10-10 


Boys 


} 1:63 


1:57 
1:47 
1:35 


1:24 
1:13 
1:04 
0:96 
0-90 


M.H.A.+M.H.B. 
Exp’ 
No. of B 
Words ii 
4 
5 10-7 -10-10 
6 10-11-11-2 
7 11:3 -11:6 
8 11-7 -11-10 
9 11-11-12-2 
10 12-3 -12:6 
11 12-7 -12-10 
12 12:11-13:2 
13 13:3 - 13-7 
14 13:8 -13:11 
15 14-0 -14:3 
16 14:4 -14:8 
17 14-9 -14-11 
18 
19 
20 
21 
22 
23 
24 
MULTIPLIERS 


M.H.A.-- M.H.B. 


Girls 
1:90 
1:73 


C.A. 
10-0-10-5 
10:6-10-11 
11:0-11:5 
11-6-11-11 
12-0-12-5 


12-6-12-11 
13-0-13-5 
13-6-13-11 
14-0-14-5 
14-6-14-11 


Girls 


1041-113 
114 -11-7 
11:8 -11-11 
12:0 -12:3 
12:4 -12:8 


12-9 -13-1 
13-2 -13:5 
13:6 -13:10 
13-11-143 
14-4 -14-8 


14-9 -14:11 


Boys 
0-84 
0:79 
0:75 
0-72 
0-69 


0:67 
0-65 
0-64 
0-63 
0:62 


Girls 
0-97 
0:92 
0-88 
0-85 

0-82 


0-79 
0-76 
0:74 
0:72 
0-69 


Boys 


A Study of Four Vocabulary Tests 


Girls 
5:0 
5:31 - 5-2 
53 
5:4 - 555 
566-57 
58 - 
5.9 — 5-10 
5-11- 6-0 
61 
6:2 - 6:3 
6:4 - 655 
6:6 - 6:7 
6:8 
6:9 - 6:10 
6-11- 7-0 
71-72 
73-74 
7:5 
76-77 
78-79 
7:10- 7-11 
80-81 
82-83 
84 
8:5 - 86 
87-88 
8:9 - 810 
8-11- 9-0 
931-92 
9:3- 94 
95-96 
97-98 
9-9 — 9-10 
9-11-10-0 


Four VOCABULARIES COMBINED 


Exp’d 
No. of 
Words 


Boys 


9-8 

9:9 — 9-10 
9:11-10-0 
10-1 -10-2 
10:3 -10-4 


10:5 -10-6 
10-7 -10-8 
10-9 -10-10 
10-11-11-0 
111 -11:2 


11:3 -11-4 
11:5 -11:6 
117 -11:8 
11-9 -11:10 
11-11-12-0 


12-1 -12-2 
12:3 -12-4 
12:5 -12-6 
12-7 -12:8 
12:9 -12-10 


12:11-13-0 
13-1 -13-2 
13:3 -13-4 
13:5 -13-6 
13-7 -13-8 


13-9 -13-10 
13-11-14-0 
144 -14-2 
143 -14-4 
14-5 -14-7 


14-8 -14-9 
14-10-14-11 


C.A. 
Girls 


10-1 -10-2 
10:3 -10:4 
10:5 -10-6 
10:7 -10:8 
10:9 -10:10 


10:11-11-0 
11:1 -11:2 
11-3 -11:5 
11:6 -11-7 
11:8 -11:9 


11:10-11:11 
12:0 -12-1 
12:2 -12:4 
12:5 -12-6 
12:7 -12-8 


12:9 -12-10 
12-11-1341 
13:2 -13:3 
13:4 -13:5 
13:6 -13:8 


13-9 -13-10 
1311-144 
142 -14:3 
144 -14-6 
147 -14-8 


14-9 -14:11 


16 M. I. Dunsdon and J. A. Fraser Roberts 


MULTIPLIERS : Four VOCABULARIES COMBINED 


C.A. Boys Girls 
5-0- 5-5 2:28 
5:6- 5-11 2:19 2:27 
6-0- 6:5 2:19 
6-6- 6-11 2-14 2-07 
7:0- 7-5 1:94 1:93 
7:6- 741 1:73 1:79 
8-0- 8:5 1:54 1:66 
8:6- 8-11 1:39 1:55 
9-0- 9-5 1:26 1:45 
9-6- 9-11 1:16 1:36 

10-0-10-5 1:07 1:28 
10-6-10-11 1:01 1:21 
11:0-11:5 0-95 1-15 
11-6-11-11 0-91 1-10 
12-0-12:5 0:87 1:06 
12-6-12-11 0-85 1:02 
13:0-13:5 0-83 0-98 
13-6-13-11 0-81 0:96 
14-0-14-5 0-81 0:93 
14-6-14-11 0-81 0-91 
REFERENCES 


[1] Dunspon, M. I., and Roserts, J. A. F. (1953). ' The relation of the Terman” 
Merrill vocabulary test to mental age in a sample of English children.’ Brit- A 
Stat. Psychol., VI, 61-70. f 

[2] DuxspoN, M. I., and Ronrnmrs, J. A. F. (1955). ‘A study of the performance ? 
2,000 children on four vocabulary tests.’ Brit. Stat. J. Psychol., VIII, 3-15. 

[3] HamıLTON, M., PickeRnING, G. W., RonznrS, J. A. F., and Sowersy, G. S. G. (19568 
‘ The aetiology of essential hypertension : (2) Scores for arterial blood pressure 
adjusted for differences in age and sex.’ Clin. Sci., XIII, 37-49. 

[4] Raven, J. C. (1948). The Mill Hill Vocabulary Scale. London: H. K. Lewis. ad 

[5] Ronznrs, J. A. F., Norman, R. M., and Grirrirus, R. (1935). ‘Studies on a chil 
Population : (i) definition of the sample method of ascertainment, and analysis ? 
the results of a group intelligence test.' Annals of Eugenics, VI, 319-338. 

[6] Roperts, J. A. F., and MELLONE, M. A. (1952). ‘On the adjustment of Terman- 
Merrill I.Q.s to secure comparability at different ages.’ Brit. Stat. J. Psychol 
V, 65-79. 

[7] RonznTs, J. A. F. (1953). ‘ The use of regressions involving variances of dependent 
variates for calculating age corrected scores.’ Biometrics, IX, 267. 

[8] Terman, L. M., and Merritt, M. A. (1937). Measuring Intelligence. London 
Harrap. 

[9] Wecuster, D. (1949). Intelligence Scale for Children. New York: The Psychologic#! 
Corporation. 

[10] Yarrs, A. J. (1956). ‘The use of vocabulary in the measurement of intellectus! 
deterioration : A review.’ J. Ment. Sci., CII, 409-440. 


Vol. X The British Journal of Statistical Psychology May 
Part I 1957 


PSYCHOLOGICAL FACTORS: 
THEIR NUMBER, NATURE, AND IDENTIFICATION 


By G. BERNYER 
Centre National de la Recherche Scientifique, Paris 


Summary 


The following paper presents a brief survey of current factorial procedures, 
with a discussion of their chief advantages and limitations. It is intended 
primarily for research workers in general psychology who desire to analyse the 


factors involved in their data without making a detailed study of all available 


procedures. An attempt is made to determine the commoner faults, either of 
orial investigations 


theory or of practice, which have rendered so many recent facti 
contradictory or inconclusive; and a number of defects are noted in existing 
techniques which appear to call for further investigation by the mathematical 
theorist. The main conclusion is that, like other statistical procedures, factor 
analysis is merely an adjunct to psychological research, not a method sufficient in 


itself!. 


Tue CHOICE or PROCEDURES 


During the past fifty years, factor analysis has undergone a rapid development 
and provoked a series of perplexing controversies. "The literature, even if we 
include only publications dealing with factorial procedures and disregard those 
summarizing actual results, is now so vast that the investigator finds himself 
confronted by a formidable mass of contributions, which recommend widely 
different techniques and often lead to conclusions that are entirely contradictory. 
The psychologist who does not wish to embark on a comprehensive study of the 
whole subject, and wants merely to select the method most appropriate to his 
own immediate problem, is faced with a bewildering task if he is to appreciate 
the technical difficulties, the psychological implications, and the special limitations 
of the particular procedure that he has decided to apply. It may therefore be 
helpful to attempt a brief review of the practical questions encountered by the 
ordinary worker who desires to choose and apply some suitable technique. 
I shall confine my remarks chiefly to investigations involving tests of ability or 
aptitude, and shall deal chiefly with the two methods that are most frequently 
used by the psychologist, namely, the method of general and group factor analysis 
on the one hand, and the method of multifactorial analysis (including rotation 


to simple structure) on the other. 


1 Editorial Note—Both the author and the editor are indebted to the authorities of the Centre 
de la Recherche Scientifique for permission to publish this paper here. The French version 
t the International Colloquium on Factor Analysis, arranged by the Centre 
and published, with a full account of the discussion that followed, in the Report of their proceedings 
(Paris, 1955). Many requests have already been received for a copy to be available in English. Weare 
grateful to Mile. G. Régnier for undertaking the translation, and to Dr. Charlotte Banks for revising 
the manuscript. Students and others can obtain reprints at cost price by writing to the Assistant 
Editor, Dr. Charlotte Banks, University College, London. 


National 
was originally presented a 


S.P. B 


18 G. Bernyer 


FUNDAMENTAL AssuMPTIONS 

Both procedures start with the broad hypothesis that the marks obtained by 
a particular examinee in a particular test may, to a first approximation, be treated 
as a linear function of certain hypothetical variables or ‘factors’; and both 
recognize two types of factor, namely, (a) ‘common factors’, i.e. those that appear 
either in all the tests (the general factor) or in some of the tests (group factors), 
and (b) ‘specific factors’, i.e. those peculiar to some single test alone. All such 
factors, except perhaps those that are invoked in the later stages of multifactorial 
analysis, are assumed to be Statistically independent. : 

Here it is too often forgotten that, even when we restrict ourselves to this 
apparently simple hypothesis, the problem of determining the factors still possesses 
an infinity of solutions. To meet this difficulty and limit the number of solutions 
available, most writers commonly invoke a ‘principle of economy’, very much 
the same in both procedures. This principle requires us to extract a minimum 
number of common factors. By thus minimizing the common factor variance, 
however, we maximize the specific factor variance. Moreover, if each of the 
two alternative procedures take as their point of departure one and the same 
bipolar analysis, the results turn out to be statistically equivalent; we can pass 
from one set to the other by an appropriate rotation. On the other hand, 
factors actually furnished by the two methods of analysis differ widely from eac 


other in consequence of the further postulates which each imposes in order to 
reach the final solution. 


ANALYSIS BY GENERAL AND GROUP FACTORS 

The ‘two-factor theory‘ of Spearman [1] sought to interpret the marks 
obtained in a test by each examinee by means of a single general factor only, 
together with a factor specific to the test. This seemed to provide a solution of 
the factorial problem which was attractive at once by its neatness and by 1t5 
simplicity. But, as soon as it was fully realized that this simple assumption 
fails to fit the psychological data [2], and that a single general factor, g, is insufficient 
to explain all the correlations between the tests, it became necessary to reintroduce 
group factors as well [3]. Spearman himself for long refused to admit the need 


for such supplementary factors ; and, when at length he agreed to adopt them, , 


he again applied a strict principle of economy, and insisted that the group factors 
should be as few as possible and that their existence and statistical significance 
should first be fully established. 

With a view to determining their number and extent, he himself relied on 
the examination of tetrad differences [4]. In its practical application, however, 
this procedure presents a number of grave difficulties. It is extremely laborious; 
it still does not always lead to a unique factorial solution; and the sampling errors 
for the tetrad differences can only be calculated by the aid of formulae which seem 
in point of fact to be somewhat crude approximations, In practice, therefore, 
the employment of this Procedure has remained confined to small tables con- 


taining only a few tests and including only a single group factor—a limitation that 
has not always been noticed, 


Psychological Factors 19 


An alternative method, more recent, which has been largely overlooked by 
psychologists, could be substituted with great advantage for that employed by 
Spearman. This is the method proposed by P. Delaporte [5]. It permits 
much larger batteries to be analysed, and allows us to use tables giving the con- 
fidence intervals for the ratio between the correlation coefficients observed. It 
thus permits the investigator to establish the factorial pattern quite quickly and 
without detailed calculation. However, in its practical application a few minor 
questions still remain to be solved: in certain cases it is by no means easy to decide 
whether the principle of economy should be applied to the number or to the 
extent of the factors; and consequently a subjective choice, which takes into 
account the actual content of the test, will then be necessary. 

Very few psychologists have in point of fact continued to follow the line 
proposed by Spearman. Burt has reviewed the reasons which led him to criticize 
this simplified method of analysis [6]. With the somewhat drastic principle 
of economy advocated by Spearman the chief difficulty is that the general factor 
which is actually extracted can vary widely as a result of the modifications that 
may be introduced in the composition of the test-battery. In the majority of 
cases its effect is to suppress the broadest group factor, which gets incorporated 
into the general factor, and as a result imparts to the modified general factor an 
additional content which should properly be assigned to the group factor, for 
example, a verbal content or a spatial content according to the nature of the 
tests employed. Asa result the general factors so obtained cease to be comparable 
from one battery to another. 

To overcome this difficulty, several investigators have adopted the bi-factorial 
method of Holzinger [7] which interprets the intercorrelations of the tests in 
terms of a general factor, g, and a number of group factors which are mutually 
exclusive. However, the procedures proposed for deciding how the tests are 
to be grouped, for calculating the factor saturations, and for testing the statistical 
significance of the residual correlations left after the common factors have been 
extracted, appear for the most part to be rather crude. 

In a recent paper on group factor analysis [8] Burt has furnished a method 
of great value for those psychologists who wish to carry out a group factor analysis 
on a more precise basis. Readers will there find, clearly summarized, a critical 
study of the various procedures most commonly used for establishing the factor 
pattern (subjective classification, the B coefficient of Holzinger, Tryon's method 
of profiles, and the device of intercolumnar correlations), together with an account 
of what Burt calls the ‘ method of summation’ for calculating the factor saturations 
illustrated by worked examples. Moreover, the procedure described indicates 
how a preliminary bipolar analysis may be used with advantage to determine the 
best grouping of the tests, and also allows the investigator to calculate factor 
saturations even when there is a partial overlap between certain of the group 
factors. 

SUBDIVIDED FACTORS 


It is likewise to Burt that we owe an important extension of the group factor 
procedure, namely, the method of ‘subdivided factors’ [9]. After describing 


B2 


20 G. Bernyer 


the hypothesis put forward by McDougall and other writers — to ees 
the structure of the mind is believed to be essentially ‘hierarchical’, Ted o 
that a divergent scheme of classification often proves E be D Re 
psychological data, and that, when a table of correlations fulfils ape is pesi 
(in particular when the residual cross-correlations remaining afte x en 
factor has been extracted are non-significant), it is possible to — «Ec 
and above the general factor certain broad group factors (e.g. an inte wat 
factor and a practical factor) which in turn subdivide themselves into puede 
group factors (e.g. a verbal factor, a numerical factor, and the like, subsume ws 
the heading of the intellectual group factor) and ^r these in turn c 
further subdivided into factors which are narrower still. 

Many British authors (e.g. Vernon) have successfully used dur m 
of this type. The preliminary bipolar analysis serves as a point A Es «d 
the calculation of the subdivided group factors. And with this mo ae um " 
not only the arrangement, but also the number of factors roe ea ues de 
somewhat greater than that of the corresponding bipolar | oq ee or i um 
classical procedures) is automatically determined by T ear Ap ed 
extracted in the preliminary bipolar analysis. Thus, if we elles poe 
instance, namely, a bipolar analysis involving four factors—one ge n 
positive, one bipolar dividing this into two, and two more subdividing és p t 
and negative sections into two subsections each, with cross-corre: -— 
approximately zero, we find that, when expressed in terms of a group- ded 
classification with positive saturations only, this yields as many as seven subdiv! E 
factors—one general and a pair of broad group factors, each subdividing into 
different pair of narrower factors. ! i 

Although group factor analysis has been used fairly widely in psycho ve 
the procedure has its limitations. In the first place, if the preliminary metho j 
calculation is based on the method of simple summation, which is ordinarily 
substituted for the more exact technique of weighted summation, then the mori 
rigorous tests for statistical significance, which are appropriate to weighte 
summation, are no longer strictly valid, and must be regarded as Spp 
only. In the second place, group factor analysis, like every other M 
procedure, can only be applied to batteries of tests which have been carefully 
planned in view of the problem to be investigated. “The common idea”, 
says Burt, “ that factor analysis can be mechanically applied to any set of data that 


happens to be handy, and that factors can thus be ‘discovered’, is entirely 
misleading." 


^ [MULTIFACTORIAL ANALYSIS 

Thurstone, in Opposition to the theo 
from the hypothesis that the inter- 
explained, not in terms of a single gen 
factors [10, 11]. As is well know 
(i.e. minor determinants of order t 


ry put forward by Spearman, starts 
correlations of a battery of tests are to be 
eral factor, but by a small number of common 
n, the proof that all the tetrad differences 
Wo) are equal to zero is the mathematician’s 


*esetsewuwevevw"» 


PEST 


Psychological Fi ELTON bab cour 


regular way of demonstrating that the rank of the matrix from which they are 
derived is one. Thurstone invokes the general form of this theorem. “In the 
multifactorial problem”, he says, “ we inquire at the outset what is the rank of 
the given table of intercorrelations; and this rank indicates the number of factors 
which have to be postulated in order to interpret the correlations experimentally 
obtained.” However, in practice “the rank of a correlation matrix is always 
equal to its order, because of the accidental errors of sampling entering into each 
of its elements. Consequently, we are compelled to make use of an approxima- 
tion to the rank of the matrix. We ask, in point of fact, whether it is possible to 
construct a theoretical correlation matrix of lower rank, which will only differ 
to a slight extent from the correlations experimentally observed. When such 
a correlation matrix can be found, then its rank gives the number of factors which 
have to be postulated to intérpret the observed correlations" [12]. 

The method thus proposed by Thurstone for solving this problem appears 
to involve several difficulties, which I shall now briefly attempt to indicate. 


COMMUNALITIES 


The first question to be settled is the size of what are called the 
* communalities', that is to say, the quantities which are to be entered in the 
principal diagonal of the correlation matrix in order to make its rank a minimum, 
Now, at the beginning of the analysis, the rank of a correlation matrix is unknown; 
and strictly speaking we can only secure estimates for the communalities by 
determining (as in the case of a correlation matrix of rank one) the lowest order 
of minor determinants which are equal to zero. Several working procedures, 
all of them more or less crude, have been suggested and discussed. But the 
majority of analysts appear content to adopt Thurstone’s short cut, and, before 
extracting each factor, insert in each diagonal cell the largest correlation in the 
column. When the number of tests is large, the error thus introduced is com- 
paratively unimportant. However, Burt and other English investigators prefer 
to carry out the estimation of the communalities or self-correlations by successive 
approximation. With smaller tables of correlations some investigators may be 
sufficiently interested to try the method of maximum likelihood described by 
Lawley, provided they are ready to face the rather laborious calculations involved. 
This yields efficient estimates, and incidentally provides a criterion for deciding 
the number of factors to be extracted [13 and refs.]. 

The next problem consists in computing from the table of correlations as 
thus completed the various common factors which will account for the observed 
coefficients within the margin allowed by the fluctuations M Thurstone 
seems now [14] to view with more favour than before [11] the method of ‘ principal 
components’ as described by Hotelling: this is the procedure with which British 
writers are more familiar as the ‘method of principal axes’, originally described 
by Karl Pearson [15]. With this procedure we obtain a series of orthogonal 
factors by “ extracting at cach stage that particular factor which will account for 
a maximum amount of the common factor variance "—the so-called method of 


22 G. Bernyer 


“weighted summation’ [16]. The method of ‘simple summation’ described 
by Burt and the equivalent procedure called the ‘centroid method’ by Thurstone 
are quicker to apply, and furnish a reasonably good approximation to the principal 
components; the centroid method is that most commonly employed by American 
writers. All these procedures can readily be applied to practically any table of 
correlations. The chief difficulty that remains is to determine at what point the 
analysis must be stopped. 


NUMBER or Facrons 


How many factors should be extracted has been a question on which numerous 
inquiries have been published, and is still far from being completely solved. 
Many criteria have been put forward for testing the statistical significance of the 
residual correlations which remain after a given number of common factors have 
been extracted [17]. Of these the more exact demand lengthy calculations 
and moreover are not valid when the analysis has been carried out by the centroid 
procedure. The majority are based on more or less empirical rules, which a 
point of fact scarcely yield any more information than can be obtained by simple 
inspection without any calculations derived from the residuals. Indeed, in 
practice, many factorists employ noneatall. "l'hurstone and his pupils apparently 
prefer to extract more factors than is stictly necessary, and then at the end of the 
analysis to discard those which are not susceptible of psychological interpretation. 
This lack of rigour is perhaps partly justified, since, as Thomson has indicated, 
the sampling errors can themselves produce common factors which will not be 
concentrated in the last centroid factor, but will be mingled with all such factors; 
and Burt has further insisted that “the discovery that, with the sample used, 
a given factor turns out to be non-significant cannot be taken to prove that it is 
non-existent ”. : 

Actually the aim of the psychologist is not so much to extract from a correlation 
matrix the minimum number of factors as to verify the existence of the factors 
that he has postulated, and if necessary to discover whether there may not be 
further groupings of the tests employed which have hitherto been overlooked. 
In order to judge whether the number of factors extracted is sufficient, he will 
in the end nearly always need to examine the particular results obtained. For 
example, in an analysis where three factors furnished non-significant residuals 
as judged by McNemar’s criterion, I myself have been able to show, simply by 
examining the diagram obtained on plotting the test-vectors, that four factors 


cw would be required to group the tests correctly and achievea simple structure 
18]. 


to reveal do not form a small 
factsseem clearly to prove, the 
number. In a recent monograph dealin 
achievement tests in terms of rotated fac 


Psychological Factors 23 


welloversixty. Now in any given factorial investigation it is possible to extract 
only a very few fully significant factors. The actual number is unavoidably 
limited by the number of the subjects who have been tested. After a study of 
a series of different researches Burt considers that, as a rule, with a battery of 
about ten to twenty tests, at least twenty subjects are needed to establish one 
factor, at least fifty subjects to establish two, over a hundred to establish three, 
and as many as two or three hundred to establish four. "Thus a factorial analysis 
can achieve a result which is at once statistically accurate and capable of a clear 
interpretation only when the number of factors aimed at has been scrupulously 


limited from the very outset. 


SIMPLE STRUCTURE 


The different methods of analysis which we have briefly enumerated—the 
method of simple summation or centroid analysis, the method of principal 
components or weighted summation, the method of maximum likelihood—yield 
factors which have negative loadings in certain tests and which often vary widely 
if the composition of the test-battery is changed. Such factors, so Thurstone 
contends, must be devoid of ‘scientific meaning’. They do not permit us to 
“interpret the various tests as functions of the mental aptitudes which those 
tests elicit”. In order to acquire a meaning, therefore, the factors should be 
subjected to rotations, which are continued until the factorial matrix has the 
largest possible number of values which are either zero or approximately zero. 
If the investigator succeeds by this device in obtaining at least one zero saturation 
in each row, and at least as many zeros in each column as there are factors, he 
is said to have achieved a ‘simple structure’. When such a structure has been 
secured, however, the group factors that emerge are no longer of necessity ortho- 
gonal to each other. ‘Thurstone regards them as ‘ primary factors’, and believes 
them to be invariant and susceptible of psychological interpretation [14]. 

In practice, the search for such a solution is not without difficulties of its 
own. No method that is absolutely automatic and objective has been devised 
for securing it. And the successive rotations which are needed require long and 
laborious trials. In practice, since there are no formulae for determining the 
errors in the factor saturations thus obtained, many analysts have arbitrarily 
adopted their own limit of significance ; and indeed often seem a little too prone 
to consider that they have satisfied the criteria for a simple structure. By such 
procedures they manage to start from almost any kind of correlation table and 
reach a set of ‘ primary factors’ whose stability and whose psychological interest 
are in point of fact far from being adequately established. Here assuredly there 
lies a great danger—one in fact on which Thurstone himself has insisted. 
* Factor analysis”, he says, “is a scientific method which has to be adapted to 
each problem: it is no mere statistical procedure; and it is not a routine device 
which one can fruitfully apply to any table whatsoever." It would seem, then, 
that every factorial investigation ought to start with a plan of inquiry, and that 
a‘ simple structure ' can have no definite meaning unless it is reached as a confirma- 
tion of factors postulated at the outset. 


24 G. Bernyer 


THE IDENTIFICATION AND INVARIANCE OF FACTORS 


By way of defending the superiority of their own particular method, S 
champions of different factorial procedures commonly assert that theirs "1 
yield factors which can be most readily identified and which will remain the mos 
stable when one changes either the battery of tests or the population tested. 
And in order to secure an objective basis for determining the identity of the factors 
thus extracted, a number of widely different procedures have been devised. T T 
following seem to be the most satisfactory suggestions [20]: (i) when severa 
wholly independent analyses have been carried out, to compare the amount . 
contributed to the variance by corresponding factors in different analyses ; 
(ii) when the experiments have been carried out on the same group of subjects, 
to calculate the correlations between the estimates for corresponding factors; 
(iii) when the experiments have been carried out with the same battery of ses 
to compare the factor saturations for the several tests by means of an appropria : 
coefficient: for this purpose Burt has devised a ‘ coefficient of similarity t i 
the two corresponding sets of ae Econ pie Cattell prefers to use a ‘ mar! 
criterion’ or a ‘coefficient for similarity of patterns’. 

Hitherto these procedures have been but rarely used, and that for — 
reasons. Until quite recently there have been comparatively few factoria 
investigations whose results would permit their application: it is in fact rather 
exceptional for the factor measurements to be estimated in detail for each of EC 
persons tested; and, even so, a knowledge of the reliability of the measuremen d 
would seem to be necessary before any trustworthy information can be deduce 
from their correlations. Moreover, it is seldom possible to compute even the 
simpler type of coefficient based on a comparison of the factor saturations, since, 
in their desire to carry out original work, the majority of authors appear reluctant 
to adopt batteries that have already been used in previous researches, and nearly 
always prefer to substitute new tests of their own. French observes that, in the 
comparative study that he has made of some 69 analyses, he has been obliged to 
limit himself to a subjective comparison of the tests because there were only four 
instances in which precisely the same batteries had been employed [19]. 
Moreover, the practical value of the different procedures still needs to be more 
fully demonstrated. Leyden [21], using data from a battery of tests administered 
to the same group of pupils at intervals of 1 to 3 years, has applied most of the 
foregoing criteria to test the stability of the general factor, and finds that they 
give very divergent results. On the whole, the so-called coefficient of similarity 
devised by Burt would seem to be the best for general use. However, such 
methods merely supply a general indication as to whether or not the factorial 
pattern exhibits any significant change, and should always be supplemented by 
a more detailed study of the material in order to take full account of all the complex 
issues involved. On applying one and the same battery to two populations at 
different levels, I myself found that, with Thurstone’s procedure, the factorial 
composition of some of the tests appeared to change appreciably, and so modified 
the theoretical pattern, although for the Majority of tests the saturations remained 


Psychological Factors 25 


fairly stable, and thus at first sight suggested that the factors extracted in the two 
investigations were in fact identical. 

Nevertheless, even when no rigorous methods are applicable for identifying 
and comparing factors, the factorist should not confine himself merely to a 
superficial examination of the factor saturations and of the psychological content 
of the tests. He should also take into consideration the general conditions of the 
experiment as a whole. The question how far the influence of the general factor, 
g, diminishes with age, while at the same time special aptitudes become more and 
more plainly differentiated, provides one of the most typical examples of the 
difficulties arising in interpreting factorial results. Numerous researches have 
been carried out on this problem, and have led to very discrepant and even con- 
tradictory conclusions. In a brief analysis of these various inquiries, Vernon 
[22] has shown clearly that alterations in the factorial schemes depend not only on 
the composition of the test batteries, but also on the degree of heterogeneity or 
selection in the populations tested and on their educational or vocational training : 
he emphasizes the fact that, even if random samples of the population could be 
tested at different age levels, the content of the tests would also have to be adapted 
to the different ages, in order not to introduce modifications into the variance of 
the first factor. It would seem, therefore, that the influence of age presents 
a problem which is peculiarly difficult to solve. 


Tue NATURE OF THE FACTORS 


When considering the problems raised when we attempt to determine the 
nature of the factors, it is well to bear in mind the view expressed by Thomson. 
“I do not believe”, he writes, “that the factors arrived at by minimizing the 
number of common factors and maximizing the part played by specifics can be 
in any sense realities, either in a physiological or a psychological sense. I think 
that the causes of ability to do well in this, that, or the other test are innumerable 
and small, some of them genetic, some of them acquired, and that the correlations 
or resemblances between test-lists are due to some of these causes being helpful 
in both tests, which overlap in the sample they draw on from these innumerable 
little causes, which I would not dream of calling ‘ factors’, and which are different 
in different persons. It remains true, however, that factors form a vivid and 
convenient description of any battery of tests, a description we would be silly 
to forgo, but also a description we must keep in its place and not raise to the 
status of a causal explanation” [23]. All this appears to me decidedly helpful, 
because it is likely to induce the psychologist to recognize the difficulties and the 
limitations, as well as the possibilities, of factorial methods. 

Moreover, if factors are thought of as possessing essentially a descriptive role, 
then the divergences between the different factorial techniques will no longer 
seem irreconcilable. ‘‘ The big difference", as 'Thomson himself points out, 
“between Thurstone’s American school of thought and the view of a majority 
of British psychologists is that t 
cling to a general factor g, where 


26 G. Bernyer 


'The difference is not fundamental except to those who reify their factors and look 
upon them as actual entities, possibly with Physiological bases such as a certain 
brain centre, a certain endocrine gland, or the like. Others, who look upon 
factors only as convenient descriptive parameters, realize that the two descriptions, 
with or without a g, are equally capable of accurately specifying a test or a man." 

From a theoretical standpoint, the antagonism between the two schools is 
now tending to disappear. In adopting oblique factors, Thurstone has indirectly 
recognized the existence of a general factor; and he has since explicitly 
reintroduced such a factor in his recent attempt to re-analyse his centroid factors 
in terms of what he calls “second order factors’ [14]. Possibly the differences 
between these alternative types of analysis might have greater practical significance, 
were the factors themselves estimated in detail for each of the subjects tested, 
In practice, however, psychologists rarely make such estimates, and appear 


contain seem to all intents and Purposes to be very much the same, 
The view that the relations between tests are to be interpreted in terms of 
a large number of small causes Suggests a reason why factor analysis does not help 


investigations to secure a small number of factors that are really stable. Beyond 
a doubt, no psychologist at the present day would pretend that it is sufficient to 
apply a large variety of tests toa large number of persons in the hope of determining, 
in a single research, all the ‘factors of the mind’. Nevertheless, many investi- 
gators still embark on factorial inquiries without clearly realizing that the factors 


batteries, and that factorial analysis cannot be safely employed except in strictly 
limited types of experiment, namely, those that have been deliberately designed 
with the needs of a factorial study in view. It is therefore only valid when, 
on the basis of some initial psychological hypothesis, a provisional classification 
of the data has been explicitly laid down at the outset. It is this constant neglect 
of the intrinsic limitations of factorial Procedures that appears to be the main 
reason why so many arduous investigations now seem to have been entirely wasted. 


CONCLUSIONS 


The main conclusions that emerge from this broad review may be briefly 
summed up as follows, Factorial analysis is not a mode of psychological research 
which is sufficient in itself. Unaided, such a method, as Burt has said [24], 
does not permit a determination of causal influences: these can only be discovered 
" by means of experimentation, of controlled observation, and of introspection i 


may, in certain cases, tend to modify or even to s 


Psychological Factors 27 


Itis in Short a mathematical test like calculating Student's f, or chi-squared, or an 
analysis of variance. Confined to this role it provides the psychologist with 
à supplementary aid of great value which can be used in the course of, but not as 
the sole basis of, a psychological research. It is to be hoped that a further 
development of factorial techniques will render such verificatory tests more 


€xact and more effective. 


REFERENCES 


[1] Spearman, C. (1914). The theory of two factors. Psychol. Rev., XXI, 101-115. 
[2] Burr, C. (1909). Experimental tests of general intelligence. Brit. J. Psychol., 
III, 94-177. 
[3] Boarn or EpvcarroN (1924). Report on Psychological Tests of Educable Capacity. 
London: H.M. Stationery Office. 
[4] Spearman, C. (1927). The Abilities of Man. London: Macmillan. 
[5] DELAPORTE, P. (1947). Prolongement de la méthode d'analyse factorielle de Spearman 
en utilisant la statistique mathématique. Biotypologie, IX, 45-59. 
[6] Burr, C. (1949). The two-factor theory. Brit. J. Psychol. (Stat. Sect.), IT, 151-179. 
[7] HorzixcEm, K. J., and Harman, H. H. (1941). Factor Analysis: a Synthesis of 
Factorial Methods. Chicago : University of Chicago Press. 
[8] Burr, C. (1950). Group factor analysis. Brit. J. Psychol. (Stat. Sect.), III, 40-75. 
[9] Burr, C. (1949). Subdivided factors. Brit. J. Psychol. (Stat. Sect.), II, 41-63. 
[10] Tuursronr, L. L. (1931). Multiple factor analysis. Psychol. Rev., XXXVIII, 


406-427. 

[11] Tuursrone, L. L. (1935). 
Press. 

[12] 'THunsroNr, L. L. (1951). L'analyse factorielle : 
psychologique, L, 61—75. 

[13] Emmerr, W. G. (1949). Factor analysis by Lawley’s method of maximum likelihood. 
Brit. J. Psychol. (Stat. Sect.), YI, 90-97. 

[14] Tuursronr, L. L. (1947) Multiple Factor Analysis. Chicago : 
Chicago Press. 

[15] Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. 
Phil. Mag., II, 6 Ser., 559-572. 

[16] Bunr, C. (1949). Alternative methods of factor analysis and their relations to 
Pearson's method of ‘ principal axes’. Brit. J. Psychol. (Stat. Sect.), II, 98-121. 

[17] Burr, C. (1952). Tests of significance in factor analysis. Brit. J. Psychol. (Stat. 
Sect.), V, 109-133. 

[18] Bernyer, G. (1956). Analyses factorielles de deux batteries de tests appliquées aux 
candidats d'une école de mécanique. Biotypologie, XVII (in the press). 

[19] FnENcH, J. W. (1951). The description of aptitude and achievement tests in terms of 
rotated factors. Psychometric Monographs, No. 5. Chicago : University of Chicago 


The Vectors of Mind. Chicago: University of Chicago 


méthode scientifique. Année 


University of 


Press. 
[20] Bartow, J. A., and Burr, C. (1954). The identification of factors from different 


experiments. Brit. J. Stat. Psychol., VIT, 52-56. 
[21] Leypen, T. (1953). The identification and invariance of factors. Brit. J. Stat. 


Psychol., VI, 119. 

[22] Vernon, P. E. (1950). The Structure of Human Abilities. London: Methuen. 

[23] TuHomson, G. H. (1951). Factor analysis, its hopes and dangers. Proc. XIIIth 
Internat. Cong. Psychol. Stockholm. 

[24] Burr, C. (1947). L'analyse factorielle dans la psychologie anglaise. Biotypologie, 


IX, 7-44. 


Vol. X 'The British Journal of Statistical Psychology May 
Part I 1957 


THE COMPARISON OF FREQUENCIES IN MATCHED SAMPLES 


By ALAN STUART 
Research Techniques Unit, London School of Economics 


Abstract 


For purposes of statistical comparison, matched samples are in general more 
accurate than unmatched. In testing the statistical significance of the results, 
it is essential to take into account the gain in accuracy, due to the matching. The 
following paper indicates how this can be done in the case of attributes. 


1. INTRODUCTION 


Psychologists and others often seek to eliminate the effects of unwanted 
sources of variation on comparisons between two groups by 'matching' 
individuals, one from each group, for values of the variables to be eliminated. 
If a quantitative variable is being investigated, differences between matched 
individuals may be tested by a Student's ¢ test or some analogous distribution-free 
procedure. A similar test is required when an attribute is being investigated, 
but does not seem to have been made available. It turns out that such a test 
can be very simply derived (almost entirely by reference to the ordinary 2x2 
table) and that it yields a test statistic whose distribution is approximately 
normal A numerical example will be given in the last section. 


2. Tue STANDARD TEST 
Let us write the 2x2 table in the usual form as follows: 


Number Number not 
possessing possessing Sample size 
attribute attribute 
First sample a b a+b 
Second sample c d e+d 
Combined samples a+c b+d a+b+c+d=n 


The standard test for the equality of the proportions in the two groups 
from which random samples were drawn is made by computing 
js n(ad — bc? 
X = (axby(accbxd(cud) © (1) 
If all the cell frequencies are of reasonable size (say 5 or more), this will be 
distributed with a close approximation to the chi-squared distribution with 
1 degree of freedom. 


30 Alan Stuart 


Another way of looking at the test is to remember that a chi-squared variate 
with 1 degree of freedom is the Square of a standardized normal variate. Thus 


s 0-4 copa) 


is approximately such a variate, Let us re-arrange (2) to give the algebraically 
identical equation: 


a c 
a+b cid 
x= a+c b+d 1 1 i (3) 
non (ataa) 


We then see that y is simply the ordinary test for the difference between the 
observed proportions a[(a 4- b) and c[(c - d), the difference being tested against 
its standard error estimated from the combined samples. 

If the samples are of equal size, namely, a+b =c+d = 2n, then (3) becomes 


n } 
1-2, lec as " 


3. THE EFFECT or MATCHING 
We now examine the effect of a matching procedure upon our test statistic. 
As before, we wish to test the difference in the numerator of x in (3). On the 
hypothesis that there is no true difference between the matched groups, that 
numerator will have a mean value of zero, as in the unmatched case; but its 
standard error will be reduced by the matching process, which has the effect of 
a stratification upon a random sample. In the denominator of (3) the factor 
a+c b+d a+c a+c 
eote (i 9), (5) 


n n n n 


which is an estimate of the ordinary binomial (Bernoullian) variance, should 
therefore be replaced by the unbiased within-strata (Poissonian) estimate, 


ant È 2o) - 1:9). (6) 


where x, Y; refer to the first and second samples respectively, and take the value 1 
when the attribute is present in an individual and 0 when it is not. Equation (6) 


simplifies to 
2 
5X cr») - 13-992); e) 
and this becomes 2 
a (9-1 X Gb 2a), (8) 


since Y x, — a and Xyi-c. 


2 and Moreover, since x; and y; — 0 or 1 for all 7, we 
may further simplify (8) to 


2 
n tAE X (x +y,4+2x,y,)} = L {(a+e)-25}, (9) 


ER a m. 


The Comparison of Frequencies in Matched Samples 31 


where we have written S = Xx; y;. Here S is the number of matched pairs in 
which both individuals possess the attribute. Substituting (9) for (5) in (3), 
and putting (a+b) = (c +d) = àn, we obtain 


7 a—c 

e A/(a4-c— 2) (10) 
as our standardized normal variate for the matched case. In testing for significance 
Y is referred to the usual critical values of the normal distribution. 

Unlike (4), its analogue in the unmatched case, (10) does not explicitly 
contain z, the effect of total sample size being expressed by the other terms. 
Moreover—though this is not at once evident from inspection—(10) is sym- 
metric as between a and c on the one hand and b and d on the other. If we 
transpose the columns of the 2 x 2 table, we get exactly the same result, as we 
should for any reasonable test. 


4. SENSITIVITY OF THE TEST 


Whether (10) provides a sensitive test in any particular situation will depend 
upon whether the matching procedure has been efficacious. Suppose, for 
instance, that the matching makes no difference to the joint occurrence of the 
attribute, i.e. suppose that the probability that an individual whose matched 
‘twin’ possesses the attribute will himself also possess it, is exactly the same as 
the overall probability of possessing the attribute, namely, (a+c)/n: then the 
expected number of matched pairs, each possessing the attribute, will be 


E(S) = Anf{(atec)/n} = (at+c)?/2n. (11) 


If in (10) we substitute (11) for S, we obtain for Y precisely the value of x in (4). 
That is just what we should expect. 

Clearly, if S tends to be larger than the value given by (11), then (10) will 
be increased and the test made more sensitive. Conversely, if it tends to be smaller, 
we may actüally lose efficiency by matching, since the individuals of a matched 
pair may behave more differently than if unmatched. — This is not, of course, a 
contingency likely to arise very often in practice; but it is worth mentioning to 
underline the fact that the benefits of matching are not automatic. In normal 
circumstances, where it is reasonably certain that matching has done its work 
properly, it would consequently be reasonable to use (4) as an effective lower 
bound for (10) in cases where for some reason S cannot easily be computed. 
This will be less wasteful of information than simply ignoring the term 25 in (10). 

The extreme example of the gain in sensitivity which may accrue as the 
result of matching will occur when S takes its maximum value, which will be the 


lesser of a or c. In this case, (10) gives 
|Y| = yla- cl 


Which is significant at the 5 per cent level (+ 1:96) whenever |a—c| > 4. 


32 Alan Stuart 


5. CORRECTION FOR CONTINUITY 


In using the standard test statistic (1), we can improve the approximation of 
its distribution to the chi-squared distribution by applying Yates’ correction 
for continuity. The effect of this is to replace the term (ad — bc)? in the numerator 
of (1) by (lad — bc| —inf. The same principle applies here; and in (10) the 
correction will have the effect of replacing |a — c| by |a—c|— i. 


6. EXAMPLE 


The following illustration is based on fictitious data. 50 boys and 50 girls, 
matched in pairs by age, size of family, father's educational level, and number of 
Tooms per person in the home, were studied in order to determine whether 
there is any sex-difference in performance at the 11 plus examination. The 
results were as follows: 


Obtained Did not obtain 
Pupils Grammar School Grammar School 
place place 
Boys 5 45 
Girls 13 37 


The value of S, the number of matched pairs in which both boy and girl obtained 
a grammar school place, was 4. We have, from (10), : 
5—13 
reu exe 7-759 
Although the difference is small, it is fully significant at the 5 per cent level, 
and indeed almost at the 1 per cent level. The effect of the continuity correction 
would be to reduce Y to about — 2:3. 


I am grateful to Mr. J. Durbin for suggesting several improvements in the 
Presentation of the paper. 


== 


Vol. X 'The British Journal of Statistical Psychology May 
Part I 1957 


HEREDITY AND INTELLIGENCE: A REPLY TO CRITICISMS 


By Cyrit Burr and MARGARET HOWARD 
University College, London 


Summary 


An attempt is made to remove current misconceptions about the way in 
which individual differences in general intelligence appear to be inherited. 
Further support for a multifactorial form of the Mendelian theory is found in 
an examination (a) of the commoner objections, both methodological and 
statistical, and (b) of the chief alternative hypotheses, viz., (i) the theory of 
blending inheritance and (ii) the theory of exclusively or predominantly environ- 
mental determination. 


I. THE COMMONER TYPES OF CRITICISM 


Nature and Importance of Problem 

In several earlier publications (Burt [2], [5] and refs.) one of us endeavoured 
to show how effectively the Mendelian principles of inheritance seemed to 
explain the observable distribution of mental differences, especially differences 
in what is commonly termed ‘general intelligence’. During the past few 
years the problem of innate differences has come once again to the fore, parti- 
cularly in connection with the allocation of pupils under the Education Act of 
1944 and the Report of the Royal Commission on Population in 1949. In the 
meantime the psychological surveys carried out in London, Bath, Scotland, and 
elsewhere have furnished new data; while the recent advances in genetics, 
the fuller understanding of the mechanisms of heredity, and above all the 
improvements in biometric techniques, now make it possible to re-examine the 
basic issues with greater hope of success. 

In the preceding issue of this Journal [9] we endeavoured to defend, on 
the basis of a more precise analysis, a theory previously put forward in general 
outline—namely, that innate differences in intelligence are due to Mendelian 
factors of two kinds: (i) major factors responsible for comparatively large 
deviations, usually of an abnormal kind, and (ii) multiple factors whose effects 
are small, similar and cumulative. However, the hypothesis that intelligence 
is inherited, both in the form that we had suggested and in various other versions, 
has of late been subjected to vigorous attack. In the main, the criticisms 
appear to fall into three more or less overlapping groups: first, those directed 
against the psychological and biological concepts in terms of which the hypothesis 
is expressed; secondly, those arising out of the abstract methodology of such 
inquiries; and thirdly, those concerned with the statistical techniques. As a 
rule, each critic singles out merely those particular aspects that happen to 
interest him, and ignores the other lines of argument. Nothing like a systematic 
survey of the various difficulties has so far been attempted. Some such review 


S.P. Cc 


34 Cyril Burt and Margaret Howard 


II. GENERAL MISCONCEPTIONS 
The Meaning of Heredity 


The most frequent type of criticism springs from misunderstandings about 
the exact nature of the issues involved and of the hypothesis put forward. Two 
distinct questions are constantly confused : (i) in what way, if at all, are genetic 
differences in ability transmitted ? (ii) how accurately do existing tests or other 
methods of estimation enable us to assess such differences ? 

In the first place, both critics and supporters of the doctrine of mental 
heredity seem as a rule to hold quite obsolete and misleading notions about the 
way biological characteristics are transmitted. — Tacitly if not explicitly, * here- 
dity’ is assumed to mean ' the tendency of like to beget like’ (the definition 
quoted by one critic from the Oxford English Dictionary). Consequently, the 
arguments for inheritance are supposed to consist in demonstrating resemblances 
between the parent and his children. When the two parents differ, the child 
is supposed to consist in a blend of both—rather on the lines of what the Arabian 
sage declared would happen if you mated a white elephant with a black giraffe, 
or the views that led older biologists to predict such hybrids as the centaur, 
the mermaid, and the minotaur, and induced even Aristotle to describe the 
giraffe as the progeny of a camel and a leopard. 

The approach of the modern geneticist is the reverse of all this. As he 
sees it, the real problem is rather to explain why in so many instances ‘ like 
begets unlike’. And the only satisfactory answer is that supplied by the 

‘mosaic theory of inheritance’ which we owe to Mendel : 


An atomic mechanism with unlimited power 

To vary the offspring in character, by mutual 

Inexhaustible interchange of the transmitted genes}, 
On this basis the chance re-combinations of a definite number of unalterable 
factors will yield, as a consequence of sexual reproduction, a wide variety of 
patterns in the ensuing generation, as dissimilar as the figures formed by shaking 
the coloured chips in a child’s kaleidoscope. 

As one of us has argued elsewhere, “ probably the most convincing proof 
of the genetic determination of abilities is the appearance of extremely dull 
children in the families of the well-to-do professional classes and of extremely 
bright children in families where both the cultural and the economic conditions 
of the home would, on the environmentalist’s hypothesis, condemn every child 
to manifest failure”, With the Mendelian hypothesis this is just what we 
should expect. An important Corollary follows. On this hypothesis a far 


! Robert Bridges, The Testament of Beauty, iii, 172-4, 


Heredity and Intelligence: A Reply to Criticisms 35 


greater proportion of the child’s inherent characteristics are transmitted to him 
through his parents than we should infer simply from the resemblances between 
them as measured by coefficients of correlation. 

There is a further reason why a preoccupation with mere resemblances 
may easily be misleading. ‘The family is the chief instrument for transmitting, 
not only innate biological characteristics, but also those intellectual, moral, and 
Social characteristics that are acquired afresh by each generation. A few of 
our critics expressly recognize this fact, and even turn it against us. But here 
their arguments merely echo what we ourselves have always stressed. It was 
a point emphasized in the very first of our contributions ([2], p. 172); and we 
regarded it as a conclusive reason for not accepting test-results at their face 
value. 'The older types of test were particularly open to such influences. 
In the earliest studies of the Binet tests, for instance, it was noted that “ tests 
which depend on a wide vocabulary or on items of skill and information imparted 
during early life in a cultured home ” are apt to favour the child of intellectual 
parents regardless of his innate capacity; on the other hand, certain tests of a 
performance type appeared, if anything, fo favour the child from the non- 
intellectual type of home ([4], pp. 194f., and [6], p. 59). 

It was largely for this reason that, in formulating our theory, we have 
always attached special weight to the distinction between genotype and pheno- 
type—a distinction which practically all our critics ignore, although it is basic 
to the whole Mendelian theory. "There are, in fact, no such things as hereditary 
characters; there are only hereditary capacities or tendencies: and the modern 
geneticist continually insists that “ almost every character can, and in fact 
nearly always does, show heritable and non-heritable variation simultaneously ” 
[12]. To define ' intelligence’ as ‘ what intelligence tests measure ', and then 
argue that "the family background and the environmental influences that 
result are more than sufficient to account for all the observed differences in a 
mental character like intelligence "—as Dr. Woolf, Dr. Heim, Dr. Fleming, 
and several other critics seem to do—is to miss the main point of our argument. 
No doubt, a few human characteristics, such as eye-colour or serological type, 
are virtually unaffected by environmental conditions; and these, as we have 
suggested, might profitably be used as ‘ markers ' to determine in fuller detail 
the transmission of mental tendencies. But no traits in which the psychologist 
is interested are likely to be determined either exclusively by inheritance or 
exclusively by environment; and for much the same reason the mere fact that 
a particular trait is demonstrably affected by environmental and post-natal 
conditions is no proof that it may not also be profoundly influenced by heredity. 

The wide-spread notion that the evidence for mental heredity rests almost 
entirely on the study of family resemblances seems largely due to the work of 
Karl Pearson. Pearson, unlike Galton, strongly opposed what Galton had 
called the ‘ particulate theory ^, and clung firmly to the traditional doctrine of 
“blended inheritance’. On this hypothesis it was natural to infer that the 
Coefficient of correlation, as an index of resemblance, supplied the best measure 


c2 


36 Cyril Burt and Margaret Howard 


of heredity. Even today most psychological and educational writers who 
discuss the subject still implicitly accept the Pearsonian conception; and few 
seem to realize that any other interpretation of heredity is available. As a 
result they commonly take for granted that anyone who employs Pearson's 
statistical procedures, and calculates coefficients of correlation, must necessarily 
subscribe to Pearson's hypotheses and conclusions. Dr. Fleming [16], for 
example, describes “ Pearson, Burt, Terman, and other workers in the first 
few decades of the twentieth century " as “ believing in the overriding impor- 
tance of hereditary differences ”, and apparently supposes that all of them relied 
on the same type of evidence and adopted the same preconceptions. Dr. Lewis 
[24], in criticizing the application of a multifactor theory to such traits as intelli- 
gence, similarly assumes that its advocates start with the postulates laid down 
by Pearson. Both critics have evidently overlooked the objections which 
Burt [2] urged against Pearson's sweeping generalizations, and the arguments 
which he adduced in favour of the Mendelian doctrine, which Pearson with 
equal emphasis repudiated. 


The Need for Specific Investigations on Specific Characteristics 


There is another cautionary principle to be learnt from Mendel. Pre- 
Mendelian discussions of heredity nearly always dealt with organisms as wholes. 
When a hybrid animal was bred from two different parental strains, earlier 
biologists (including Darwin himself) were content to note the general similarity 
between the different creatures, much as they did in deciding whether a newly 
discovered animal or plant should be assigned to this or that species. Mendel, 
on the other hand, in his work on hybridization, announced at the very outset 
that it is essential to deal separately with separate characters. Psychology has 
been slow to adopt the same principle. In their references to the genetic 
differences between the sexes, for example, psychologists, educationists, and 
sociologists, even in the earlier decades of the twentieth century, still argued 
the question in the most comprehensive terms : do women display an all-round 
inferiority to men, or are they the equals of the male? The notion that the 
average woman might be genetically inferior in some capacities, superior 1N 
others, and perhaps qualitatively different in yet other respects, or that certain 
characteristics might be definitely sex-linked, has been confined chiefly to what 
Dr. Maberly calls the ‘ psychometric school ’ [3]. Most medical psychologists: 
indeed, sharply contest the ‘ atomistic view’ on the ground that “ each person- 
ality, whether male or female, must be viewed as a unique and unanalysable 
Whole, not as a mosaic of particles or factors ”. 

. A number of our critics, however, somewhat inconsistently assume that 

a ae ne simul shares this monistic attitude. The yg z 

ite m. sedi T a single overall Proportion ' which will hold m e 

of hereditary differences pe. € vid ee jeu our 

in an equal degree But thi — - applying Kg even form: af DS um by 
. 1S version of our views IS quite unwarrante 


— o MÀ 


Heredity and Intelligence: A Reply to Criticisms 37 


anything that we ourselves have said. Probably the notion that ‘‘ Spearman, 
Burt, and their co-workers use the data of their tests to deduce a single all- 
embracing estimate for the relative importance of nature and nurture ” springs 
from a misinterpretation of the phrase ‘ general intelligence’. Spearman, it is 
true, in his earlier correlational studies maintained that his ‘ hierarchical tables ° 
demonstrated the presence of a single general factor only— the sole determinant 
of an individual's potential achievements in every walk of life’, and held that 
specific abilities, or what were described as such, were simply the effects of 
acquired knowledge, interest, or skill. We ourselves would wholly agree with 
those who reject any such simple assumption. Nevertheless, as numerous 
studies of backwardness indicate, an upper limit for each individual child is 
undoubtedly set by this innate factor of intelligence, though it is a limit which 
few children, bright or dull, ever actually attain. Moreover, both limits and 
possibilities depend partly on special abilities; and, like intelligence, these 
abilities are also to a large extent genetically determined. Consequently, what 
is actually attainable in the different subjects of the curriculum will be limited 
in very different degrees. 

Dr. Heim and Dr. Harrison apparently imagine that, when we concluded 
that “about 75 per cent of the variance shown by tests of intelligence is attri- 
butable to innate or inheritable causes ”, the figure was meant to specify “ the 
proportional contribution of heredity to the individual’s general educational 
and social efficiency” in every direction. To maintain any such wholesale 
conclusion would, we readily admit, be “ to maintain what is at once meaningless 
and absurd” [21]. Indeed the absurdity becomes obvious as soon as we 
consider what such an inference would entail. Our figure of 75 per cent related 
to the variance contributed by the innate factor to the results of intelligence 
tests. We should therefore have to suppose that the ‘ saturation ' or ‘ loading’ 
for this innate factor in every school subject—reading, composition, arithmetic, 
manual work, and the rest—would not only be the same, but would always be 
unity. Now in various investigations on tests of educational attainments we 
have clearly shown that the saturation coefficient varies widely from one subject 
to another, and that it varies almost as widely for the same subject at different 
ages. L 

Spearman's view no doubt implied, as he himself pointed out, that, in 
theory, every individual could be ranked in a single series which would remain 
unchanged for all forms of cognitive activity. And asa result those who advocate 
the use of intelligence tests in the school have generally been credited with 
much the same doctrine. Not unnaturally, the notion that each human being 
is stamped at birth with his own serial number, and has his place irrevocably 
fixed in the queue for relative efficiency, is repugnant to most social writers of 
the present day, whose sympathies lie usually with. those in the lower ranks. 
However, as must be clear from our previous publications, we should be the 
first to reject this modernized doctrine of predestination. Human beings, if 
they are to be classified at all, will have to be arranged, like the abilities they 


38 Cyril Burt and Margaret Howard 


exhibit, in a ramifying hierarchy, not in a one-dimensional scale: and if some 
particular individual is weak in one type of ability, there are others in which 
he may quite possibly excel. Thus our reply to the problem raised by Heim 
and Harrison would be that very succinctly expressed by Haldane: “the 
question of the relative importance of nature and nurture has no general answer, 
but only a very large number of particular answers ” ([18], p. 36). 


The Analysis of Assessments of Intelligence and the Analysis of Tests 


However, in this paper, as in the preceding, we shall be concerned, not 
with special abilities, but only with general intelligence. Here once again it is 
to be remembered that what the psychologist can directly observe or measure 
is only the phenotype, never the genotype. Hence our first aim has been a 
practical one—to estimate the relative efficiency of the methods available for 
assessing intelligence. We might almost borrow Galileo's explanation when 
he repeated Archimedes’ experiment : “ I am not trying to discover how many 
ounces of gold there may be in the crown, but simply how much accuracy 
there is in the method ". For this purpose we have endeavoured to ascertain 
the comparative merits of the two most obvious procedures: first, the use of 
standardized tests, with the marks taken just as they stand; secondly, checking 
and correcting the marks so obtained in the light of the available evidence. 

Now, what our critics chiefly wish to challenge is the efficacy of the tests 
in common use. Many are disturbed by the idea that, merely on the 
basis of a Binet I.Q., some unfortunate child may be permanently labelled ‘ dull’, 
'subnormal', or even ‘mentally deficient’; others fear that grave injustice 
may be done to youngsters from the underprivileged classes, if selection for 
grammar schools is determined mainly by their performances in tests at the 
11 plus examination. We ourselves, however, are perfectly willing to admit 
that, as a means of estimating genotypic differences, even the most carefully 
constructed tests are highly fallible instruments, and that their verdicts are 
far less trustworthy than the judgments of the pupil’s own teachers; the out- 
standing merit of the tests is that, unlike the teacher’s judgments, they are 
comparable from one school to another. 

Accordingly, our investigations have been directed towards discovering, 
first, how far such tests are likely to be affected by irrelevant influences, and 
secondly how far existing methods of assessment can be improved upon by more 
comprehensive devices. Most psychological investigators have concentrated 
on the first of these two questions only. Almost inevitably the investigator 
who enters the schools from outside is obliged to confine himself to short and 
simple Procedures—group tests of the usual type. And here lies the unique 
advantage of an officially appointed psychologist who is himself a member of the 
education authority's staff : he not only knows at first hand what are the condi- 
tions affecting the examinees in the various schools, but also possesses the 
authority to extend or repeat his tests and his interviews, and to require from 
teachers the detailed information and the further assistance that he needs. 


a 


Heredity and Intelligence: A Reply to Criticisms 39 


The data we have used. for the purpose of our various analyses have been 
secured in this way; and, having satisfied ourselves that by these means we can 
reduce the disturbing effects of environment to relatively slight proportions, we 
have gone on to inquire whether the data so obtained—the frequency curves, 
the bivariate distributions, the correlations between relatives brought up under 
various conditions (twins, cousins, children and their parents, siblings living 
in the same home, in different homes, in residential instititutions, and the 
like)—are consistent with the hypothesis of multifactor inheritance. "That has 


formed our chief theoretical problem. 


Intelligence and Social Class 

Most of the critics we have cited appear to confuse these various issues. 
This is especially noticeable in discussions on the relations between intelligence 
and social class. The hereditarian holds that, in communities where there is 
a good deal of freedom to move from one socio-economic group to another, 
* the class of an individual depends largely, though by no means wholly, on 
his innate characteristics and on those of the family from which he springs "' 
[27]. The environmentalist retorts that the class of an individual is “ the 
main factor determining his performance in those tests of intelligence by which 
his innate characteristics are supposedly assessed ". After all, it is said, 
intelligence tests are for the most part designed by academic psychologists, that 
is, by persons with the cultured outlook of the ‘privileged classes’. As a 
result, they have “ a strong bookish bias, and are decidedly ‘U’” (i.e. upper 
class) “ in their tone and vocabulary " ; and to assume that such tests measure 
intelligence and that what intelligence tests measure is innate ” is (so Dr. Heim 
declares) to be guilty, not only of ‘ verbal imprecision’ but also of ' personal 
bias' [21]. Professor Zangwill has recently come to the defence of the test- 
constructor, and suggested that his bookish bias may be justified by the fact 
that our present civilization itself is built up on books: in the near future, he 
thinks, a technological age may lead the psychologist to prefer a more practical 
brand of test. But the psychologist selects his tests and his test-items, not in 
accordance with his subjective notions, but on the basis of an objective item 
analysis and of the correlations of his tests with the main underlying types of 
* cognitive ' (directive) process—perception, association, coordination, abstraction, 
reasoning, noctic synthesis, and the like. Unfortunately in actual practice 
the most effective modes of testing cannot always be used; and the defects of 
the tests adopted for workaday purposes result usually from the unavoidable 


need to adopt something that entails no great expenditure of either time or 


money. E 
All this, however, Dr. Heim seems to overlook. To support her views 


she cites the figures published in the inquiry on ' Ability and Income ’ by Burt 
[7], which, she says, were “ acclaimed as proof of the supremacy of heredity 
and the rightness of the existing order of things . RE For such people " (the 
psychometrists of the hereditarian school) “ equality of opportunity consists 


40 Cyril Burt and Margaret Howard 


in affording individual children the same opportunities as their parents "; 
and thus the investigator mistakenly infers that “ ‘the haves’ inherit from 
‘the haves’ " their apparent intelligence, whereas in point of fact all that they 
inherit are “ more books in the home” and “the means to pay for better 
schooling ". ‘The figures recently collected by Mrs. Floud [17] and Mr. Flann 
Campbell [10] in regard to the social composition of the child population in 
secondary ‘ grammar ' schools have been interpreted in much the same way. 

Now we cannot help feeling that it is Dr. Heim who is really guilty of 
“ verbal imprecision and personal bias”. When the psychometrist states that 
this or that group test ‘ measures intelligence °’, he merely means that it yields 
a more probable estimate of each individual’s intelligence than could be obtained 
for the same group with the same expenditure of time and money. Dr. Heim 
interprets the phrase as though it meant ‘this test measures intelligence and 
nothing else’. None of those she criticizes have ever assumed that “ what 
intelligence tests measure is innate”. And even the most hasty perusal of the 
article that she quotes will show that Burt and his co-workers were far from 
claiming that the figures there given supported “ the rightness of the existing 
order of things ". Indeed, their final conclusion was that, among the non-fee- 
paying population, “ about 40 per cent of those whose innate abilities are of 
university standard fail to reach a university, whereas an equal number from the 
fee-paying classes receive a university education to which their innate abilities 
alone would scarcely entitle them ” (loc. cit., p. 98). Nor is it true to suggest 
that “ the psychologists who design and recommend such tests are themselves 
members of the privileged classes ". In point of fact, as they have gratefully 
recognized, those who have been most active in this field would themselves 
have been unable to enjoy the benefits of a public school or a University had it 
not been for the scholarship system; to quote Godfrey "Thomson, this fact 
was for him (as it was for several others) the main “ incentive . . . in endeavour- 
ing to improve methods of selecting pupils” ([37], p. xv). A glance at the 
reports of the L.C.C. Psychologist will show that in actual fact such tests have 
served far more frequently to save the backward and the merely retarded from 
being certified as mentally deficient and to award the bright pupil from a poor 
home a free place in a ‘ secondary’ or grammar school when his performance 
e the older type of scholarship examination would probably have excluded 

im. 

But in any case much of this criticism is really beside the point. No 
competent psychometrist would nowadays rely on results collected from tests 
applied primarily for practical purposes, in order to solve his own theoretical 
problems. At best, in an inquiry into the extent or nature of mental inheritance, 
the figures supplied by official examinations can furnish nothing but confirmatory 
evidence. And the material which we ourselves have chiefly used was obtained 
from surveys planned with the theoretical issues specifically in view. 

Some justification for the protests made by Dr. Heim and other environ- 
mentalists can no doubt be found in the exaggerated claims of the older eugenists : 


Heredity and Intelligence: A Reply to Criticisms 41 


vehement assertions on the one side inevitably provoke vehement dissent on 
the other. Pearson, for example, more than once declared, on decidedly 
tenuous evidence, that “ the influence of environment is nowhere more than 
one-fifth of that of heredity, and quite possibly not one-tenth of it" [31]. 
Sed latet dolus in generalibus: and this sweeping generalization trapped him into 
practical corollaries which not unnaturally tended to discredit the whole move- 
ment: “a heredity-hunt ", as G. K. Chesterton observed, ** would be a greater 
menace than a heresy-hunt " [11]. A few years later Dampier Whetham, one 
of the most active champions of eugenics, stoutly opposed all ' competitive 
examinations ' on biological principles ; “ scholarships” (he warned his readers) 
" have their dangers when used to raise those who win them out of their natural 
class; they were good sociologists, as well as good divines, who taught the 
child to ‘ learn and labour truly, and do his duty in that state of life into which 
it has pleased God to call him °” [40]. “ Francis Galton himself ”, Professor 
Hogben reminds us, ‘ was a man with a strongly aristocratic bias ". And perhaps 
it was not surprising if “eugenics thus became identified with a system of 
ingenious excuses for combating the amelioration of working class conditions ” 
[22]. Yet, when all is said, it seems a little unfair to saddle psychologists of 
the present day with the exaggerated views and social prejudices of the pioneers 
of fifty years ago. Scientific issues cannot be solved by turning them into a 
kind of erudite class-war—patricians versus plebs. After all, the basic error 
committed by the earlier hereditarians was much the same as those of their 
present day critics. They equated ‘ observable intelligence’ with ‘ inherited 
intelligence '—the phenotype with the genotype; and, if their theory of inheri- 
tance was grossly at fault, that at least was pardonable when genetics was still 


in its infancy. 
III. METHODOLOGICAL CRITICISMS 


General Criticisms 
The main methodological argument may be briefly summed up as follows. 


What the psychometrist calls * genotypic intelligence’ is avowedly an abstract 
and hypothetical ‘factor’. Now, hypothetical abstractions, like ‘ factors k 
cannot be empirically measured; even if they could, the ‘ factors ' of the statistical 
psychologist are admitiedly non-existent figments; and in any case all endea- 
vours to establish the existence of an innate factor of intelligence have so far 
completely failed; therefore, any further attempt to investigate or measure it 
is plainly doomed to failure; and the quest may as well be abandoned. 


Let us take these allegations one by one. 


The Alleged Impossibility of Measuring Hypothetical Quantities 

No one denies that strictly speaking “an innate quality, like intelligence, 
from its very nature cannot itself be measured ” Butom that Cake; says the 
critic, “ there can be little point in discussing a hypothetical innate ability 
unless we can measure it”, particularly as the only method of verification 
available must itself depend on the assumption that the quantities involved are 


42 Cyril Burt and Margaret Howard 


to be measured. This difficulty was urged by several contributors to the 
Symposium arranged by Section J (Psychology) of the British Association at 
its Newcastle meeting; and more recently Dr. Maddox and two or three of our 
correspondents have maintained [26] that any quantity which cannot be directly 
measured or ‘operationally defined’ must be ‘ metaphysical’, and that it is 
“a postulate of modern scientific methodology that all such unverifiable 
concepts should be discarded out of hand, together with the ether and similar 
hypothetical figments ". 

Now we doubt whether the majority of scientists of the present day would 
subscribe to the ‘ postulate’ on which this argument is founded, although 
twenty years ago the proposal to dispense with all but directly verifiable concepts 
enjoyed for a time a widespread vogue. Contemporary physics, particularly 
quantum physics, is full of hypothetical quantities which at best can only be 
estimated with varying degrees of accuracy. Energy, to take the most familiar 
concept, cannot itself be directly measured. Like intelligence, it is, by defini- 
tion, a ‘capacity’, namely, a ‘capacity for work’. Consider the ordinary 
methods of determining the amount of energy supplied, say, by a Leclanché 
or Daniell’s cell. As the student of physics is warned, the laboratory methods 
commonly employed are valid only on the assumption that the energy from the 
chemical changes is wholly converted into energy of current. But, strictly 
speaking, this is never the case; and various ‘errors of measurement’ have 
always to be allowed for. This is exactly parallel to the psychologist’s problem. 
The ‘ capacities’ with which he is concerned can never be directly measured, 
but only assessed. Nevertheless, provided he can devise some means of judging 


the probable margin of error, this in no way precludes him from using such 
assessments in his statistical proofs. 


The Alleged Non-existence of Mental Factors 


Many of our methodological critics would readily allow that certain 
hypothetical quantities may, in some sense or other, be measurable; but they 
insist that such quantities form a strictly limited group. Even if there is no 
direct observational evidence that the quantity in question does exist, yet (so 
they argue) that quantity must be a thing that might exist; and, when it is 
introduced as part of a working theory in science, there ought in addition to 
be some factual evidence to show that it probably does exist. Critics who pursue 
this line usually quote Thomson’s warning about the dangers of ' reifying ° 
abstract factors: “ the psychologist's factors are statistical artefacts; a man 
does not have a g in the sense that he has a liver " [37]. Moreover, as they 
point out, the variable that appears in our tables is “ an abstract factor twice 
removed ": it is the genetic component (or sub-component) of the general 
Cognitive component. Such a rarefied abstraction, they contend, is “ nothing 
but the ghost of a ghost ”, and can have “ no real existence ”’. 

, Here it seems helpful to distinguish between the mathematical model 0” 
which our proofs are based and the ‘ real’ or concrete phenomena to which the 


— n 


$ - 


Heredity and Intelligence: A Reply to Criticisms 43 


model refers. Practically all scientific theories are founded on symbolic models 
of this kind—highly simplified constructs formed out of highly abstract concepts, 
skeletons with the flesh all scrupulously removed. They are essential for the 
preliminary deductions; and, in order to apply or verify the corollaries so 
deduced, all that is logically necessary is that the algebraic variables defined in 
the model should correspond with, or, as the phase goes, ‘ be isomorphic with ’, 
the essential features of the real phenomena. There is no need for them to be 
directly correlated with what is actually observed or measured : they may simply 
represent what Broad has called “dispositional properties ’ (EJ, pp. 434f.). 
And, as he reminds us, it is distinctive of modern, in contrast to mediaeval 
science, to interpret causal properties of this kind (including what mediaeval 
writers would have called the ‘ powers’, ‘faculties’, or ' capacities? of the 
mind) in terms of the minute spatio-temporal structure of the material substances 
that presumably serve as vehicles for such properties. That, as we have argued 
elsewhere, would be a perfectly feasible way of interpreting the particular 
' dispositional property ' which we have designated ‘ intelligence '. 

But, of course, this version of the situation demands a separate set of proofs, 
to show how the factors in the model are related to, or represent, the 
characteristics in the actual persons that we test. Oblivious of this gap in their 
arguments, statistical psychologists are too apt to confine themselves to purely 
statistical considerations; and are often content to identify their ‘ factors’? 
straightaway with supposed psychological characteristics on no other evidence 
than that afforded by the names of the tests employed. In our view, such 
identifications always call for the support of non-statistical evidence—obser- 
vational, introspective, experimental, neurological, or the like [8]. 

In an instructive paper, Mr. Miles has argued that hypothetical entities 
like the factor of intelligence may be described as having ‘real existence ’ 
“only when they are not ' factors of the mind’, but ‘factors of the body’, i.e. 
dependent on the ‘ discoverable behaviour of genes or neurones ' ” [29]. We 
fully agree that this is by far the most satisfying way of conceiving the disposi- 
tional properties that concern us here; and we hold that a plausible inter- 
pretation of this kind is actually furnished by the multifactor theory, Could 
this theory be adequately verified, we might even define an individual's innate 
intelligence (gamma) as proportional to the number of gamma-genes in his 
genetic constitution, where a gamma-gene is conceived as part of the ultra- 
Microscopic structure of the chromosome, and may itself be defined as an 


! For our purposes it will be sufficient to define a * dispositional me as eee 
requires a hypothetical proposition to describe it. Thus, when we M Pr a stee ea ig e 
Property of ‘elasticity’, we mean: “if such a spring is pressed, viii Es straine AE M en 
released, it will (unlike a piece of putty) return more or less to its original s ane me eB rase 
' more or less ' implies that the property has varying degrees; and therefore, alt d A d ie iic 
a hypothetical abstraction, is susceptible of measurement; in fact, the degree of elasticity may be 


€xpressed by various ratios or ‘ moduli’: e.g. 


stress pull per area of cross section 


^ strain change in length per unit length" 


44 Cyril Burt and Margaret Howard 


allelomorph whose presence adds a small positive quantity to one and the same 
specifiable mental capacity, namely, the capacity to perform all kinds of 
cognitive task. It will no doubt be objected that to count the actual number of 
such genes would be an impossible feat. True; but it is equally impracticable 
to count the number of molecules in one cubic millimetre of a given substance; 
yet the molecular density of a gas is defined in terms of the number of molecules 
contained in such a volume. In both cases what the scientist does is to make, 
not a direct count, but an indirect estimate of the number in question. 


The Alleged Non-existence of ‘ Innate Intelligence’ 


Dr. Maddox [26] endeavours to reinforce the criticisms we have been 
discussing by giving the methodological argument a more empirical turn. 
After prolonged research by various methods, he says, “ investigators have failed 
to produce any convincing evidence about the genetic components of intelli- 
gence”. Hence our own “attempt to resuscitate the nature-nurture contro- 
versy " he condemns as quite “ anachronistic”. A wiser methodology would 
leave such purely conjectural factors in the nebulous limbo to which they belong, 
and would concentrate exclusively on environmental factors. “ Environmental 
factors ", he points out, “ are capable of control; and for this reason are of 
greater interest to educators". He therefore fully applauds the conclusion 
expressed by H. E. Jones “ that potential contributors should now be advised 
that the nature-nurture controversy has been shown to be an unproductive 
field of research "'. 

This would be an easy way of cutting the Gordian knot. But a policy of 
contented nescience would mean renouncing every hope of solving numerous 
problems which are of first importance alike for theory and for practice. ‘The 
mere fact that environmental conditions are more amenable to control, and 
hence of paramount interest to the educator, cannot of itself absolve the 
psychologist from the task of investigating the possible influence of genetic 
factors. In dealing with any particular child!, it is as necessary to consider 
what cannot be done as to know what can. If Tommy’s backwardness is due 
to an innate lack of intelligence, then it is useless, and worse than useless, to 
strive to transform him into a normal pupil. If Harry is not endowed with the 
ability requisite for the work in a grammar school, it is folly to transfer him 
thither simply because there is a slender chance that “ even the dullest youngster 
2d be a genius in disguise ". It is only in fiction that the hundred-to-one 
outsiders romp home in the final lap, and leave the experts gaping. 

. Our own explanation for the failure of previous research would be very 
different from that which Dr. Maddox has offered. The obvious reason surely 
is that the investigators to whom he refers were all thinking in terms of out-of- 
date concepts of heredity : even had they wished to substitute a Mendelian 


1 For conflicting opinions on this point, see the E a correspondence on ‘ Dullr 
; See the articles and corres; 26 
and its Causes’ in the recent issues of the Time: pplement, cto Y 
s E 
NI ducational Suppl , October 19 


p XI 


Heredity and Intelligence: A Reply to Criticisms 45 


theory, they possessed no adequate techniques for coping with multifactorial 
problems. Dr. Maddox, however, apparently believes that, from the very 
nature of the issues involved, it is impossible to secure evidence that might 
justify a hereditarian interpretation ; and his scepticism, as he points out, 
can claim the sanction of most American psychologists. Nevertheless, it no 
longer commands the support of American geneticists. Professors Goldschmidt, 
Snyder, Calvin Hall, and Calvin Stone all deplore the neglect of genetic investi- 
gation by psychologists of the present day. Goldschmidt himself traces this 
general lack of interest to the fact that empirical psychology began as an offshoot 
of philosophy, at a time when associationism held the field: translated into 
physiological terms, the associationist theory gave rise to a ‘ doctrine of universal 
conditioning ', which has lately dominated the American behaviourist school; 
and as a result practically all individual differences are interpreted in terms of 
postnatal experience. “ This”, he writes, “ may have been responsible for 
the extreme belief of our contemporary psychologists in the power of environ- 
ment". Nevertheless, “ with a little prodding from genetics, both human 
and animal psychology might increasingly assimilate genetic notions, and be 
stimulated by them for mutual benefit 1, 

What, then, in the most general terms, are the methods of the present-day 
geneticist, and what grounds have we for supposing that they cannot be 
profitably borrowed by the psychologist ? Broadly speaking, there are four 
main lines of approach: (i) cytological analysis—the study of cell structure ; 
(ii) the method of uniform environment with varied heredity; (iii) the method 
/ of uniform heredity with varied environment; (iv) genetic analysis—the study 

of the effects of the segregation and recombination of genes in successive 

generations (cf. [36]. Man is a slow-breeding animal of low fertility ; he 
| cannot be readily subjected to experimental control; and his behaviour is 

governed by post-natal experience to a larger extent. than that of any other 

creature. These peculiarities render the psychogeneticist's task more complex; 
d but they do not make it impossible. The validity of such methods for the study 
of mental characteristics is a matter to be decided, not by the abstract canons 
" of some preconceived methodology, but by the pragmatic principle of giving 
yy them a trial, and noting whether they yield any worth-while results. 

The evidence we have offered in support of our own hypothesis has been 
derived from all four lines of approach. (i) In the case of man, cytological 
knowledge is still lamentably defective; yet it is sufficient to demonstrate that 
the mechanisms of heredity and variation are essentially the same in the human 
"a Species as in all other bisexual creatures. We have relied on it chiefly to show 


! Genetics in the Twentieth Cont ry (1951), p. 7. In the same vloma et me Pe 
how, “ after a short flurry of interest in human heredity during the first ecade one “a s 
century, that interest, particularly in America, proved to be short lived Ge E H: Utea 
recently, however, centres of research in human genetics have been established in the 


d by 1948 an American Society of Human Genetics had been formed. Although hitherto 


Sta 
tes, an “already 


engrossed with the problems of physical and pathological heredity, its embes ae e ina 
beginning to ask whether it may not be time to embark on genetic studies o a a 


ee 


characteristics ”, 


46 Cyril Burt and Margaret Howard 


(a) that there is a strong antecedent probability that a ehe cl d 
intelligence, which appears to be continuously graded and normally Hen 
is determined, or at any rate influenced by, a multiplicity of eem d he wide 4 
(b) that the number of genes or ‘ factors’ is large enough to or esit. 
range of differences and the smooth distribution curves that are € T Pob 

(ii) Studies of children brought up in residential institutions wl eh | differ- 
mental conditions are relatively uniform have disclosed wide indivi lua anil 
ences in intelligence that cannot possibly be ascribed wholly to environ ate 
influences, and are closely correlated with those exhibited by ard T hen 
(iii) Studies of monozygotic twins reared in different environments hav n A 
used to assess the effects of environmental differences on (a) intelligence ie i 
of the ordinary type and (b) adjusted assessments based on all the — de 
available for each individual child. (iv) In the main, however, the proce the | 
adopted by the human geneticist must necessarily be statistical i and pde 
methods of genetic analysis developed by the modern biometric schoo hors 
already proved their efficiency in analogous fields of work. f Today, gene cs 
Who work with cattle or with crops are constantly attempting to tere un 
strength of whatever genetic quality it may be that makes for a good yie wur 
this or that breed and in this or that strain: there can surely be no a P 


: s : : the 
objections to extending their methods, with appropriate modifications, to 
problems of human psychology. 


IV. STATISTICAL TECHNIQUES 
General Nature of the Techniques Employed ly, 
This brings us to the type of criticism most frequently advanced, name 
that directed against the bi 
During the earlier p 


$ we 
problems, initiated by Galton, Pearson, and Weldon, was quickly overshadov 
by the more direct and frui 


States. The interest in statistic 
has been widely interpreted by psycholo 
Today, however, p 
problems of multifactorial inheritance 
and those developed b 
assent. 


ey 

ig E ae nsa 

» Statistical techniques are indispe iversa | 
y Fisher and his followers now command almost un 


Yet they have not esca 
themselves. In this count 


: e 
a nation of their views the reader must be refer" 
the replies of the writers th 


difficulties which they emphas 


Heredity and Intelligence: A Reply to Criticisms 47 


A. Objections to the Analysis of Correlated Variables. Several of our 
critics cite the final chapter of Hogben’s book on Nature and Nurture [23] as 
if it completely disposed of any attempt to apply Fisher’s methods to such 
problems as our own. Dr. Maddox, for example, in discussing what he describes 
as our * balance sheet for nature and nurture ’, claims that Hogben’s arguments 
conclusively demonstrate that the whole procedure is ‘ logically untenable °’, 
Hogben himself never goes so far as this. Indeed, in the chapter quoted, he 
expressly admits that “ the technique of correlation can be used to draw attention 


is beyond dispute”. His chief Contention is that Fisher’s method of analysis, 
at any rate in its original form, is legitimate only when we are able to “ deal 
with nature and nurture as independent variables "5; when nature (or ‘ heredity ’) 
and nurture (or ‘ environment *) are themselves correlated, then (so he main- 
tains) the results may prove deceptive. “ The data on which Fisher’s own 
analysis for stature is based were ", he points out, “collected from a compara- 
',; ie. from a Broup in which the relevant 
environmental conditions were much the same for all; but were we to apply 
it to “ human populations in general ", we should be forced to recognize that 


of genetic factors, we must be cautious about using them to indicate its amount; 
and thus, whatever significance Fisher’s attempt to draw up “a balance sheet of 
nature and nurture” may have for the study of human characteristics, “ it 
cannot entitle us to set limits to the changes that might be brought about by 


In his larger volume on Genetic Principles Hogben sounds a more optimistic 
note. He assures us that “from a theoretical standpoint the Prospect is 
brighter " than it formerly appeared. In his opinion, it is now both possible 
and necessary “ to develop methods which will enable us to assess the relative 
importance of nature and nurture in a specified range of conditions ". Among 
the Procedures available he includes the study of “ observed differences in 
Performance of psychological tests ”, such as he himself has used in his investi- 
gations on twins; and he advocates five particular types of inquiry “ which, 
when applied with proper safeguards and interpreted with discrimination, 
Will", he believes, “ yield unequivocal data”. These are the study (i) of 
twins and siblings, (ii) of adopted children, and of the effects of (iii) inbreeding, 

! The book quoted by Dr. Maddox and Dr. Heim consisted of semi-popular lectures, addressed 
to à medical audience nearly twenty-five years ago, and was concerned, not so much with the 


technical merits of Fisher's mathematical procedure, but rather with the practical inferences which 
ad rashly been drawn from it. Hogben’s own views are more systematically presented in the 


arger textbook cited in the next paragraph. . . 
It should be added that, since the above was written, Dr. Maddox has slightly revised the 


Wording of his argument: but his essential conclusions remain the same. Our quotations were 
taken (with his Permission) from the earlier version. 


48 Cyril Burt and Margaret Howard 


(iv) outbreeding, and (v) order of birth; and under each heading he cites one 
or two model researches, usually from the domain of animal genetics. These 
lines of inquiry, it will be seen, are much the same as we ourselves had tentatively 
employed in our earlier investigations. And, though Hogben himself might 
not approve the details of our work, there is no difference of principle between 
us. In short, the prophet whom Dr. Maddox has called in to curse such enter- 
prises seems more disposed to grant them his blessing. 

Hogben ends with a plea which we ourselves should like to second as 
strongly as we can. After deploring the lack of “ University machinery " for 
giving instruction in modern genetics to students of medicine, social psychology, 
and allied subjects, he goes on to urge, in the most convincing terms, the 
importance of independent provision and special centres for research in the field 
of human genetics. It is, as he says, a subject which demands “ organized 
teamwork on a very large scale ", including the “ collaboration of geneticists, 
statisticians, educational psychologists’, and other qualified persons. It will 
therefore be costly. Yet “what we know already liberally encourages us tO 
hope we may eventually know much more ". 

All this seems far more reassuring in its tone than Dr. Maddox’s quotations 
might suggest. No psychologist would quarrel with the cautionary remarks 
which Hogben makes in criticizing the unwarranted deductions drawn by the 
older enthusiasts. We ourselves have never sought to apply the method to 
i human populations in general". And the difficulties that arise when 
environment is allowed to vary unquestionably call for special treatment. 
They can, we believe, be satisfactorily met in two alternative ways: (i) by 
modifying the design of the experiment; (ii) by amplifying the statistical 
technique. 

(i When planning the inquiry, we may seek to exclude any irrelevant 
correlation with environment by selecting either the sample of persons OF the 
sample of tests. Thus, when using tests that are not ' culture-free ', we coul 
for purposes of theoretical research select children coming from a restricted tyP* 
of environment in the hope of eliminating in advance any cultural differences 
that might affect the tests. On the whole, however, we think it best to substitute 
where possible, tests (or other methods) of such a nature that cultural difference? 
are not likely to affect the results. 

i (ii) In studying, for more practical purposes, the value of marks obtained 
with tests of the ordinary type, we have endeavoured to supplement Fisher $ 
original procedure so as to cope with the further variables involved. To chee 
the adequacy of the method, we proposed an alternative analysis in terms © 
EK Sep parallelogram ay: Here, instead of starting with the correlation 

E en assessments and environmental differences, we deduced it; an 3 
Ti eder eee Aene 

D. hcm im d we value computed directly from ad hoc rà her's 

c es one or two further objections to Fish 
procedure. Since, he says, when judged by the correlation actually obtain^^^ 


Heredity and Intelligence: A Reply to Criticisms 49 


ancestral heredity will account for little more than 50 per cent of the total 
variance in bodily characteristics, Fisher is obliged to “ invoke other factors, 
such as segregation and assortative mating, to avoid attributing the balance 


between the environmental and the genetic components. Moreover, so Dr, 
" initial step "—namely, ascribing the correlation 
between children and their Parents to genetic factors —* is suspect, because of the 


lian, but pre-Mendelian. Does Dr. Maddox really hold that Fisher Should have 
discarded the concept of ' Segregating factors? in favour of the older theory of 
“blending factors '—a theory rejected by every modern geneticist? (ii) The 
data for bodily characteristics which Fisher used were obtained, as we have 
seen, from a “ fairly homogeneous middle class group ". Hence with this 
particular group the supposed differences between “ favourable and unfavour- 
able environment” did not in fact exist. Further, it is quite erroneous to 
assume that the effect of environment inevitably increases the correlation found, 
As Yule points out, both environment and dominance may tend to reduce it. 
(iii) Finally, Fisher readily admits that some of the genetic components are 
only “ imperfectly additive ”, and has therefore introduced a specific procedure 
which enables him to treat the irregular effects of interaction as an independent 
residual component. 

B. Objections to Second Degree Statistics. The most direct attack on the 
type of approach that we have used is contained in a provocative paper contri- 
buted by Dr. Bernard Woolf to the Edinburgh Colloquium on * Quantitative 
Inheritance ’ ([33], pp. 81-102). His argument is divided into two parts : 
in the first he challenges the principles and assumptions adopted by nearly every 
British writer who has discussed quantitative inheritance in man—Fisher, 
Fraser Roberts, Thomson, and Burt; in the second he examines the specialized 
techniques initiated by Fisher, developed by Mather, and “ common to current 
Workers in biometrical genetics ". The final outcome, so Dr. Woolf tells us, 
is to enforce two main contentions: “ the first is that real genetic situations with 
environmental and other interactions include too many variables to be amenable 
to the kind of analysis developed by Fisher and his school; the second is the 
utter inadequacy of orthodox analysis of variance, when confronted with a 
teal world not arranged in Latin squares with all effects strictly additive : 

(i) The Inheritance of Intelligence. All too often our various critics seem 
to take their notions of what their opponents have said, not from their opponents 
Publications. but rather from each other; and Dr. Woolf’s somewhat garbled 
version of our views has been so readily accepted by later writers that it is 
Necessary at the very outset to emphasize how often he misinterprets what we 


S.P, D 


50 Cyril Burt and Margaret Howard 


have actually said. He opens with a detailed discussion of an ever-popular 
problem—the alleged decline of national intelligence; and attacks what he 
takes to be the solution propounded by those who adopt the ' multiple or additive 
factor hypothesis’. Nowhere does he explicitly state how he himself under- 
stands the term ‘intelligence’; but his comments imply that he sides with 
“those who prefer to re-define intelligence as ‘what intelligence tests 
measure’”. However, little or no interest can attach to the question. whether 
the mere capacity to answer tests has improved or diminished. The Ae 
raised by the writers he quotes, and revived by the inquiries of the Roya 
Commission on Population, related to what the Commission itself called 
“innate intelligence’.  Burt's abridged memorandum, to which Dr. Woolf 
chiefly refers, gives a formal definition in the opening paragraph; and the 
Appendix ‘ On the Meaning of Intelligence’ emphatically rejects the alter- 
native, which Dr. Woolf seems to prefer ([41], pp. 68-75). Of course, if 
Dr. Woolf insists on starting off with a * re-definition ' of the crucial term, it 
becomes the easiest thing in the world to show that our argument leads to ? 
reductio ad absurdum. . 

However, besides misinterpreting the key-word in the discussion, he also 
gives an unfair twist both to the evidence adduced and to the final conclusion. 
In summarizing the evidence on which his: opponents chiefly relied, he tells 
the reader that “ all their predictions have been purely inferential, based on test 
scores taken at only one point in time " (loc. cit., p. 84, our italics). This statement 
contains at least three misleading implications. First of all, in the London 
investigations at any rate, ‘the predictions’ were not based solely on test 
scores’. In the memorandum and elsewhere a clear distinction was draw? 
between ‘the crude test results’ and the ‘adjusted ' or ‘ final’ assessments 
In doubtful cases different types of test were used at different ‘ points of time > 
and in every case the assessments were submitted to the teachers, and, where 
necessary, corrected in the light of fuller information. 

Nor is it true that ‘the predictions’ were * purely inferential ’. f 
Woolf will turn to the section in the Memorandum entitled * Estimatin 
Amount of Apparent Decline’, he will see that two methods of prediction are 
discussed. The first, headed ‘Indirect Estimation’ (p. 69), was ‘ pure 
inferential'; but the calculations were immediately qualified by the commen 
that on such matters armchair deductions cannot be trusted without confirma" 
tion. Had the calculated changes actually occurred, they would have involves 
a marked increase in the number of defectives and an equally marked decreas 
in the number of scholarship winners; and this, it was argued, “the 0 
head teachers and officials would assuredly have noticed ”. An alterna 
method was therefore used in the hope of “ securing direct confirmatio” 4 
This is described under the’ heading ‘ Direct Estimation’ (pp- 6lf.), 
consisted, of repeated surveys carried out at two or more widely separate 
ý points of time’. The result was a very different set of figures. The conclusio 
ultimately reached was that “ far more extensive studies are required ”’. 


If Dr. 
g the 


tive 
” 


, 


D 


^ 


Heredity and Intelligence: A Reply to Criticisms 51 


After sounding his ‘stern warning’ against the ‘ fallacy’ of indirect 
inference, Dr. Woolf goes on to say: “ Still less was it legitimate for Thomson 
and Burt, without any tests whatever of the parents, to visit upon them the marks 
scored by their children... . Their anticipations of decline might have been 
justifiable as pessimistic speculations, though the great confidence placed in them 
would not have been ” (p. 86, our italics). But in each of our surveys, assess- 
ments were individually obtained for a representative sample of parents, checked, 
for purposes of standardization, by tests of the usual type ([41], p. 52, [8], 
p.172). How else does Dr. Woolf suppose the correlations between children 
and parents were computed ? Moreover, the burden of the whole memorandum 
was to deprecate any feeling of ‘confidence’, And the opening summary 
plainly announced that the evidence at present available was “ far from conclu- 
sive". Hence the reiterated plea for research. 

In discussing the influence of genetic and non-genetic factors on individual 
differences in intelligence (or rather on the assessments of those differences), 
the “ first conclusion " that Dr. Woolf wishes to enforce is that “ Burt seriously 
underestimates the amount of population variance attributable to environmental 
influences ". To establish this point, be begins by examining “ the study by 
Barbara Burks on which he (Burt) bases such far-reaching conclusions ” (loc. 
cit, p.87). Dr. Burks’ investigation, we are told, was carried out on only 
“ 214 foster children with 105 children as a control group”. But the principal 
basis for Burt's conclusions was not this American inquiry, but a series of 
investigations conducted with the help of several co-workers, and covering nearly 
600 children in “ residential Schools, orphanages, and foster homes Where the 
inmates are received during early infancy " and including a far larger sample of 
controls attending ordinary L.C.C. schools. This itself was merely supple- 
mentary to a mass of other data—correlations between parents and offspring, 
siblings, twins, and the like—to which Dr. Woolf nowhere alludes. The 
American inquiry was only cited at the end of the Appendix to show how an 
independent investigator in an entirely different country had arrived at much 
the same figure as that derived from the London surveys. 

Dr. Woolf next turns to Burt’s table for the mean assessments for children 
classified according to their father’s occupation. He admits that the total 
Tange, from those belonging to “ higher professional or administrative ” classes 
at one end of the scale to “ casual labourers ” and “ institutional defectives ” 
at the other, is “enormous... amounting to more than 30 points ", But, he 
asks, is it not possible to account for “ all the observed differences by environ- 
mental effects... without any need to bring in genetical differences tg To 
Support this suggestion he quotes figures from various sources (including Burt 
and Thomson); but all that these figures show Is that environment may 
appreciably affect uncorrected test-results—a point which of course was never 
in dispute. He“ freely acknowledges that all these explanations ' (i.e. explana- 
tions he himself would favour)—“ differences in parental attention, birth-rank 
effect, pre-natal disadvantages, etc.—are speculative "; and certainly the data 

D2 


52 Cyril Burt and Margaret Howard 


he cites would seem rather to indicate that these various ' non-genetic influences ’, 
though undoubtedly present, are far smaller than the genetic. 

However, this is not our main objection. What we demur to is the tenden- 
tious way he frames his question. The implication is that there are only two 
alternatives: (i) that all the differences are to be accounted for by heredity; 
(ii) that all of them are to be accounted for by environment—a most misleading 
antithesis. Surely Dr. Woolf would not pretend that intelligence differs from 
every other human characteristic in being totally unaffected by heredity. And 
certainly he does not deny that differences in the apparent intelligence of 
children are associated with differences in the genetic constitution of their 
parents. Why then should we not invert his question, and ask: “is it not 
possible to account for all the observed differences by the genetical differences 
without bringing in environmental influences as well ? ". 

But, of course, both questions fallaciously imply that the two modes of 
causation must be mutually exclusive. For such an assumption there is not 
the slightest ground. Indeed, it must be obvious to anyone, who is not already 
strongly biased towards one explanation or the other, that both types of influence 
are generally at work. And the question to be asked is not : could the difference 
be accounted for wholly in this way or wholly in that, but what is the most 
probable way of apportioning the results between the first type of cause, the second 
type of cause, and the mutual interaction of the two. 

"Throughout this part of his paper much of the apparent plausibility of the 
arguments employed rests on a fallacy so often committed by psychologists and 
others who have not themselves had experience of mental testing. He treats 
‘the L.Q.' as a kind of absolute measurement which is necessarily the same 
for the same child regardless of the method by which it has been obtained. Thus 
conclusions drawn by one writer who is referring to I.Q.'s obtained with group 
tests are applied to other I.Q.'s obtained by another writer who used individual 
tests; conclusions drawn from verbal tests are applied to I.Q.'s obtained with 
non-verbal tests ; and conclusions drawn from uncorrected I.Q.'s obtaine 
from any test are assumed to hold good of composite assessments, carefully 
checked, where the I.Q. forms merely a conventional unit. Hardly ever 
when quoting an I.Q., does he mention how it was secured. With the origina 
Binet tests (with which some of the quoted results were obtained) the standar 
deviation was less than 13 LQ. points; with the tests used by Cattell it was 
over 20; for our adjusted assessments (based originally on percentile ranks) 
we used an arbitrary I.Q. unit with a standard deviation of 15. Dr. Wool 
treats these and other I.Q.’s as though they were all equivalent. That, o 
course, would make nonsense of any re-allocation of the variances. 

(ii) (The Analysis of Variance. In attempting to assess the relative influence 
of genetic and non-genetic factors, we ourselves relied almost entirely on 2" 
analysis of variance; and the second part of Dr. Woolf's paper is devoted to ? 
spirited attack on this procedure. 

His first and main objection to what he describes as ‘ the red herring of 
variance analysis’ is that it commits us almost inevitably to an underestimate 


t Ses 


d 


— 


Heredity and Intelligence: A Reply to Criticisms 53 


of those environmental influences that he himself has enumerated. It is, he 
Says, a necessary condition of the statistical technique adopted by Fisher and 
his school that “ environment should appear as a random error which can then 
be eliminated by arithmetical legerdemain rather than as an integral part of the 
situation to be embodied in the conclusions ". Here, beneath the phrase 
“rather than’ there lurks the same fallacy as before—the tacit substitution of 
the ‘ exclusive (or alternative) ' or for the ‘inclusive’. F rom the standpoint of 
the statistician, environment may be conceived as influencing individual varia- 
tions in two distinct ways—first, by its ‘ systematic ’ effects (as ‘ an integral part 
of the situation’), and secondly by its ‘random’ effects. These are two 
separable factors (though in fact they are usually found together); they are not 
two alternative methods of treating one and the same factor. Because they are 
separable, it is possible, by a suitable choice of cases, to get rid of the systematic 
effects before ever we proceed to the statistical analysis. When this has been done, 
it is no longer true to say that what is an ‘ integral part of the situation’ now 
‘appears’ as a random factor. It has been eliminated in advance; and the 
‘random’ part is merely what is left. Moreover, in the material with which 
Fisher was dealing, the elimination could not have been the result of Fisher’s 
statistical technique, since it had already been effected by the manner in which 
Karl Pearson had actually chosen his sample. 

In another part of his paper, Dr. Woolf seems vaguely aware of this point, 
and puts his criticism in a different form. If, he says, we intend to assume 
(as Fisher does in the case of bodily measurements and as we have done in the 
case of intelligence) that the trait to be analysed is “ influenced by both heredity 
and environment, then environment must be given primary place in the analysis : 
it is not legitimate to postulate genetic differences until it has been shown that 
environmental influences are unable fully to account for the variations in the 
trait; even then, heredity can only be invoked to account for the residual 
differences " (our italics). ' 

Now, had the researches which Dr. Woolf is criticizing been designed to 
study the various processes by which environment can affect the phenotypes, 
then the principle he thus prescribes would no doubt have been appropriate. 
But their purpose was quite different. Fisher's object was to demonstrate 
that his own theory of human inheritance accounted for the data obtained by 
Pearson far better than the theory that Pearson himself had championed. We 
in turn were anxious to see whether much the same form of the multifactor 
theory might not account for the parallel figures obtained with assessments of 
intelligence. But we had an ulterior aim as well, namely, to determine, within 
rough limits, what proportion of the total variance was attributable to genetic 
factors in the case of (i) raw measurements obtained from tests and (ii) adjusted 
assessments based on all the evidence accessible for each individual child. 
Surely it is sound methodology to eliminate so far as possible, and for the rest 
to randomize, environmental effects when the object is to analyse heredity, 
Just as it would be to eliminate or randomize genetic effects if the object was to 


analyse environment. 


54 Cyril Burt and Margaret Howard 


Dr. Woolf, however, declares that “ randomization of environmental effects 
is tantamount to saying that there is no hope of ever understanding that aspect 
of the situation”. Nevertheless, in other researches—e.g. Burt’s studies of 
backward children and of those mistakenly diagnosed as mentally deficient, 
where the problem was to analyse the manifold ways in which environment 
might mould the child’s mental development and modify his performances in 
tests of various types—the operation of ‘ that aspect of the situation’ was very 
intelligibly explained. In any case, this part of Dr. Woolf’s argument surely 
flies in the face of his other contention—the contention which he himself places 
first. He started, it will be remembered, by declaring that “ real genetic 
situations . . . contain too many variables to be amenable to the kind of analysis 
developed by Fisher and his school ". And that is precisely why, on our view, 
when planning such inquiries, it is essential to simplify the issues, and deal with 
separate aspects of the situation in separate inquiries. But, if in the same research 
we try to embody environment ‘as an integral part of the conclusion ’, that must 
inevitably multiply the number of variables instead of diminishing it. 

In order to demonstrate in greater detail the fallacies involved in using 
“second order statistics’ (i.e., variances and coefficients of correlation) Dr. Woolf 
proceeds to scrutinize at some length the illustrative example worked out by 
Mather in his book on Biometrical Genetics. This he takes to be typical of the 
methods adopted by current workers in that subject. He himself contends 
that in all such investigations ‘ first order statistics are essential’. Why precisely 
first order statistics should be exempt from the disadvantages of second order 
statistics is left rather in the dark. Indeed, as Dr. Mather observes, Dr. Woolf’s 
mode of argument seems to “ contribute more to a trenchant style than to a 
scientific appraisal "; and in his criticisms he too often “ actively misrepresents 
both the assumptions made and the methods used”. In regard to the issues 
with which we ourselves are concerned, there would seem to be little difference 
between the results obtained by the two types of statistic. Suppose, for example, 
we wanted to determine whether the test-performances of children reared in 
foster-homes were superior or inferior to those of children reared by their own 
parents, we could either compare the two means (as Dr. Woolf recommends) 
or we could carry out an analysis of variance: with either method the answer 
would of necessity be identical. Where more than two means have to be 
compared, an exact algebraic identity no longer obtains; but there can be little 
question that for general purposes the second procedure is at once the simpler 
and the more efficient. Thus, the contrast between first degree and second 
degree statistics turns out to be rhetorical rather than real. 

u For a convincing answer to the more technical details of Dr. Woolf's 
criticisms, the reader may be referred to Professor Mather's own paper in the 
same publication ([33], pp. 103-111). As he remarks, there are certain minor 
but undeniable defects in the particular researches that Dr. Woolf has discussed; 
and these reduce the cogency of the conclusions reached. But most of the 
Shortcomings seem to arise, not from the inadequacy of the statistical procedure 


Soy 


» 


Heredity and Intelligence: A Reply to Criticisms 55 


but from inadequacies in the data available or in the planning of the experiments 
by which the data were obtained. 


V. ALTERNATIVE HYPOTHESES 

The Inductive Procedure 

In conclusion we should like to draw attention to one point that nearly all 
our critics have overlooked, namely, the logical form of the argument by which 
we have defended the hypothesis advanced. As in all empirical problems, the 
whole argument is essentially inductive, not deductive; and it is convergent 
rather than serial. Dr. Harrison declares that “ the chain of Burt’s reasoning 
can be no stronger than its weakest link: so long as any one of his premisses 
remains open to doubt, the conclusion that hangs on, it is liable to fall to the 
ground, and the reader is left free to choose some other way of explaining the 
facts". But the conclusion was sustained by several chains, not by one. In 
the paper on ‘ Evidence of the Concept of Intelligence ’, five different proofs 
were offered—observational, introspective, genetic, neurological, in addition 
to the mere statistical analysis, which we ourselves regard as no more than 
confirmatory. Thus the theory we have advanced really rests on what Whewell 
called * a consilience of inductions '. 

Nor is it sufficient to show that the facts adduced may be explained in some 
other way: the critic must also show that his way is more probable than ours. 
When Dr. Harrison adds that our own explanation is ‘ never conclusively proved’, 
we readily agree: in the natural sciences, nothing is ever conclusively proved. 
The final inference was put forward not as conclusive or certain, but merely as 
more probable than any alternative explanation. "The only way to rebut this 
argument is for the critic to produce a constructive hypothesis of his own, and 
formally demonstrate its higher probability. 

What then are the rival explanations that our opponents have to offer ? 
We have searched their pages [16], [21], [26], [33]; and nowhere can we find a 
clearly defined theory, much less any formal proof, which would show, as Fisher 
for example showed, that the new theory yields a better fit to the facts observed. 
However, judging from fragmentary comments that have emerged in the course 
of the discussion, there would seem to be two more or less plausible alternatives 
that compete with our own. Most of the critics who, like ourselves, accept the 
possibility that intelligence is in part inherited would apparently favour some 
form of the venerable theory of ‘ blended’ inheritance in place of the neo- 
Mendelian theory of ‘ multifactor’ inheritance. The rest of our opponents 
seem to hold that inheritance plays only a negligible or at any rate an unimportant 
part, and ascribe the observable differences between individuals primarily to 
environmental influence. 

1. Blended Inheritance. The theory of ‘blended inheritance’ has the 
sanction of Darwin’s name behind it, and was almost universally accepted until 
the dramatic rediscovery of Mendel’s hypothesis of ‘ particulate inheritance °. 
The theory of blending is also the explanation which, by its very simplicity, 


56 Cyril Burt and Margaret Howard 


naturally commends itself to those who are unfamiliar with modern genetics. 
The facts most frequently adduced in its support are those of ‘ regression’, 
which both Galton and Pearson interpreted in terms of blending. Put in 
explicit shape, the argument runs somewhat as follows. Suppose the general 
mean of the population to be 100 I.Q. units, and that of the father to be 
100--x (where x may be either positive or negative); then, on the hypothesis 
of random mating, the most probable level of the wife's intelligence will be 100; 
hence that of the child will be the average of the two, namely, 


4{(100 +x) + 100} = 100+ 12; 


thus, as a result of bisexual reproduction, there will be a regression of 0-50. 
To take account of the effects of assortative mating and remoter ancestry, a 
slightly more complex version was elaborated by Karl Pearson ([30], p. 456). 
But the underlying principle remains the same. 

Here, however, a mere comparison of means (such as Dr. Woolf recom- 
mends) does not enable us to decide between the hypotheses of blending and of 
multiple factors, since, as we have seen, the value deduced from each of the 
two theories would (under the conditions stated) be exactly the same. But, 
when we turn to the second degree statistics, we find a striking difference. 
For simplicity, let us take the general mean to be zero; and let the deviations 
of father, mother, and child be x, y, and z respectively, and the correlation 
between fathers and mothers 7. Then (ignoring such complicating influences as 
dominance and environmental differences) the deviation of the child will, on both 
hypotheses, be the average of its parents’ deviation, viz., s = 1(x4-y) = i147) 
([9], p. 115). If there are N males and N females in each generation, the 
variance of the adults will be Y «?/N and > y?/N = o? say (since with autosomal 
factors the difference between the variances of the two sexes would be nil). On the 
multifactor hypothesis, as we have seen, the variances of the children would 
remain the same ([9], p. 110); but, on the hypothesis of blended inheritance, 
the variances of the children would be Y 22/N = 1 Y (a?2-2xy - y2)/N = Jo? if 
mating is random, or J(1--7)s? if mating is assortative. Thus, the effect of 
blending is to reduce the inheritable variance by roughly one half in each 
generation. Now the data both for bodily and for mental characteristics make 
it clear that no such reduction takes place. On this hypothesis, therefore, 
we are driven to assume, as Darwin did, that some unknown cause or causes 
must be continually at work to produce fresh “spontaneous variations’. But 
all that we know of the rate and nature of mutation puts any such ad hoc 
assumption completely out of court. 

Accordingly, in the absence of more positive evidence in its support, we 
may conclude that the hypothesis of blended inheritance has a far lower 
probability than the hypothesis we have proposed. 

:2. Environmental Influence. Those who hold that the mental differences 
between individuals are the effect chiefly of environmental conditions commonly 
follow Watson and the behaviourist school, and draw a sharp distinction between 


= 


4 


yv 


Heredity and Intelligence: A Reply to Criticisms 57 


Im 


behaviour and structure. “ Differences in structure”, we are told, “are, or 
may be, to a large extent hereditary; differences in behaviour are due almost 
exclusively to training. ... Man is, before all others, an educable animal; and 
conditioning—the simplest form of education—begins so early, even as early as 
embryonic life, that any differences in hereditary conditions are completely 
overlaid by the effects of learning”. “ Give me a dozen healthy infants ”, 
says Watson, “and my own specified world to bring them up in, and I will 
guarantee to train any one to become any type of specialist—doctor, lawyer, 
artist, beggar-man, or thief” ([39], pp. 104f.). These and similar optimistic 
claims are, we are told by Dr. Fleming and others, largely borne out by the 
work of Dr. Stoddard and his colleagues at Iowa ([16], pp. 111f.). As a result, 
* since the opening of the fifth decade of the present century a greater belief 
in educability has characterized educational thinking ". Nevertheless, Dr. 
Fleming continues, even so recently as 1931, “it was possible for an official 
publication to announce " that the psychological evidence “ necessitated the 
provision, not merely of separate classes, but of separate types of schools for 
pupils of differing capacity "; and she quotes Burt as an “ early sponsor " of 
this view, who has “ recently repudiated " it. “It is no longer supposed ", 
she adds, “ that children can thus be classified into a small number of discrete 
types". Children who were formerly classified as ‘incurably defective’ are 
now grouped with milder cases of backwardness, and given special training; 
and even the current doctrine that children who are ‘ innately bright’ can be 
picked out at 11 plus for transference to grammar schools is “ now thought to 
be an injustice to those who are not c 

There is a slight misunderstanding here: Burt has never “repudiated ” 
the evidence for separate schools and classes. But for the rest, we should be in 
cord with Dr. Fleming's comments on the administrative changes 
in current educational practice. Yet that is not because we attach less weight 
than before to the importance of genetic influences. In any case, this “ new 
emphasis on the powerful effects of environment " is by no means so novel 
as these writers apparently suppose. It constituted the chief explanation for 
individual differences offered by the * philosophic radicals ' of the nineteenth 
century—the founders of the old associationist school which for so long domin- 
ated educational theory in this country; and even Darwin at first attributed 
* individual as distinct from specific or varietal difference’ to ‘ environmental 
pressure rather than inborn endowment ’, until, as he handsomely owned, Galton’s 
book on Hereditary Genius ‘ made a convert’ of him’, The argument which 
Galton himself used to controvert ‘ the excessive reliance on environmental 
explanations’ could be turned no less effectively against American and British 
behaviourists of the present day. Since they admit that man’s ‘ unique 
educability is inherited, just like his power of speech, his foetal face, and his 
lack of a visible tail, is it not extremely probable that innate deviations in the 
former characteristic (as in all other innate traits) should distinguish different 


1 See the letter quoted in Galton's Memories of My Life (1909), p. 290. 


€ 


fairly.close ac 


58 Cyril Burt and Margaret Howard 


members of the race? Why should educability be the sole biological trait 
which exhibits no individual variation ? : 
How precisely differences in the environment are Supposed to act in order 
to produce the wide differences observable between one individual and another, 
is hinted at only in the most vague and speculative way. Nor is it altogether 
clear what exactly its partisans include under the term environment. We are 


(i) The Acquisition of Knowledge or Skill. The older British associationist 
school, and their descendants the American behaviourists, refer almost exclusively 
to the consequences of what they variously call ‘ education ', ‘training’, or 
‘conditioning’. If, they say, the child from a higher socio-economic class 
generally manifests a higher degree of intelligence, he owes it simply to the fact 
that he went to a better school or received a better training at home. How then 
are his superior opportunities Supposed to operate ? Chiefly, it would seem, 
though perhaps not solely, by allowing and encouraging him to learn (a) certain 
items of knowledge and (b) certain forms of skill. These, it is argued, will fully 
account for his successes or his failures in the tests that the psychometrist 
employs. 

Certainly with individual tests like those of the Terman-Binet scale, the 
child from the poorer and less cultured home is penalized in questions that 
depend on the knowledge of recondite words (as in the Vocabulary Test) or on 
certain semi-scholastic skills (like reading, writing, and drawing). But the child 
from the ‘ privileged classes '; brought up, as he so often is, in a sheltered home, 
is just as likely to be inexperienced in such practical matters as giving change, 
discriminating weights, or finding a lost ball. Here it is to be noted that, as 
we have pointed out in earlier studies of such tests, the advantage by no means 
rests exclusively with the child from a superior socio-economic class ([4], 
pp. 164f). 

As for group tests, it is said, they are usually verbal: they require an ability 
to read words, to understand Words, and sometimes to spell them. A child's 
performance in such tests, so it is maintained, will consequently depend upon 
his schooling and the cultural background of his home. But this is a non- 
sequitur. Because the reaction required by the test includes an activity that 


likely to be harder for some children than for others; and will so arrange the 


Heredity and Intelligence: A Reply to Criticisms 59 


lives on barges or canal-boats—the test-results may at times be distorted by 
environmental handicaps. But, when that happens, the blame should rest 
rather with the investigator who chose an unsuitable procedure than with the 
test as such. A school psychologist who is attached to the education authority’s 
staff should have ample opportunity and adequate resources for detecting these 
exceptional cases; and the need for such vigilance, and the systematic use of 
various means for checking such possibilities, were points that were fully 
recognized in the days when backward pupils could be nominated for certification 
as mentally defective. Hence we feel fully convinced that, in our own data, 
errors of this kind have been almost entirely obviated by the plan we have 
adopted. 

(ii) Stimulation. Many of our critics would be willing to accept these various 
contentions. But they will explain that what they themselves mean by environ- 
mental aids are not so much the items of knowledge or skill acquired at school 
or elsewhere, but rather the general mental stimulation provided by a cultured 
home. The ‘stimulation’ is supposed to operate partly by giving the child 
ample scope and encouragement for exercising his latent powers, and partly 
by “ inculcating a higher motivation and habit of accurate and speedy work ”. 
And thus, as Dr. Fleming tells us, “ the longer a child remains in unstimulating 
environments, the lower becomes his performance in tests of what is called 
intelligence, and the earlier he is transferred to intellectually stimulating 
situations the higher becomes his measurable response ". 

This version of the environmental hypothesis owes its popularity to the 
widespread belief in ‘transfer of training’. Intelligence, it will be seen, is 
conceived as a kind of faculty, which is strengthened by continued practice and 
liable to atrophy through disuse. Without wholly endorsing this familiar 
doctrine in its usual guise, we are inclined to agree that certain tests may in 
some small measure be influenced by the child’s acquired habits of alertness, 
responsiveness, and the like. At the same time, it is our impression that the 
effects of such ‘stimulation’ are to be traced quite as often among children 
from the less cultured classes as among those from better educated families, 
As one of the headmasters told us: “ If anything, your tests seem to favour the 
* smart alec’ and the future ‘spiv’ rather than the child from an intellectual 
home". But in any case the effects are slight as well as disputable; and those 
who seem to make them part of their hypothesis are avowedly relying on mere 
surmise, and offer no first hand evidence for their views. In the rare instances 
in which the examining psychologist suspected any such factors, it would be his 
duty to allow for differences in attitude or background, and amend his assessment 
in the light of fuller or better information. 7 

(iii) Pre- and Post-natal Health. 'The environmental conditions to which 
we ourselves should attach the greatest weight are those affecting the develop- 
ment of the child during the earliest year or two of life, including the period 
spent in utero. In certain cases there can be little question that illness, weakness, 
or malnutrition during these formative stages has permanently impaired not only 


60 Cyril Burt and Margaret Howard 


the child's physical health but also the growth of his central nervous system. 
Moreover, the effects that ensue cannot, by their very nature, be eliminated by 
changing the type of test or by improving the method of assessment. Even 
among well-to-do families, the first child, and those born when the mother is 
advanced in years, seem prone to suffer from pre-natal handicaps and perhaps 
from the inexperience or the incompetence of the mother in her capacity as trainer 
and nurse. Some indication of the effects that follow may be gleaned by 
classifying test-results according to the order of the child's birth: with the 
majority of tests, as one of us has pointed out in earlier 
first born and last born are decidedl 


that is strikingly shown by the 
ortality. Yet, in spite of these 
child population have displayed 
se of these years; and certainly 
t the proportion of educationally 
and feebleminded—has in any way 
nything rather the reverse. 


idence to suggest tha 
backward, 


gained high assessments for intelligen 
usual type. Anyone who may h 
poorer slums of our larger cities 


as living in ' very poor ' or * 


impeccable, much the same p 
hopelessly defective : 
since one can never be 
may not have escaped detection. 


ME Conctusions 
therefore, ma 
that environ 


Our final conclusion 
, be su 
there can be no question did Si ullum, 


ed First, 
mental conditions may influenc 


e the 


Heredity and Intelligence: A Reply to Criticisms 61 


results obtained with intelligence tests of the usual kind. How great that influence 
is will depend on the style of test employed and the type of child examined : with 
the ordinary school population, and with tests which have been competently 
constructed, selected, and adminstered, we believe that environmental differences 
are far less influential than genetic, and can as a rule account for no more than 
25 per cent of the actual variance. Secondly, we do not deny that environmental 
factors of the kinds we have enumerated, particularly the last, may to some 
extent have affected the observable intelligence of individual children to a degree 
which no refinements in the methods of testing or assessment could possibly 
eliminate. If so, however, it still seems clear that such effects must be compara- 
tively slight; otherwise it would not be so difficult to secure direct and indus- 
putable evidence to demonstrate their presence. We therefore conclude that 
environment can have played only a negligible part in determining the actual 
figures—the variances, the correlation coefficients, the univariate and bivariate 


distributions—that formed the main basis of our argument. k 
So far, therefore, this more detailed examination, both of the objections 


advanced and of the alternative hypotheses put forward, results in no appreciable 
modification in the views or the methods originally suggested, but, in the main, 
tends to confirm rather than to confute the conclusions already drawn. 

At the.same time it is necessary once again to emphasize that these conclu- 
sions are strictly limited and avowedly provisional. "The most serious defect 
in our argument has been the lack of any method for assessing the probable 
margin of error. To meet the doubts expressed by our various critics, it is 
necessary not merely to demonstrate the presence of genetic factors, but also to 
determine, if possible in quantitative terms, their practical importance. "Their 
mere existence Hogben, for example, is ready to acknowledge; but apparently 
he would prefer to leave it an open question whether their share in the variance 
amounts to 70 per cent or perhaps to no more than 10 per cent. That, however, 
would be useless for practical purposes. With our own data, we believe that a 
rough indication of the margin of error is furnished by the factor we termed 
‘unreliability’, But a far more rigorous proof is needed. 'The next step, 
therefore, should be to plan a full scale inquiry on systematic lines which will 
permit the application of such tests of significance as are already available (cf. [28]). 

And even if our own estimates are accepted as reasonable approximations, 
it must be frankly recognized th t the conclusions so reached are valid only in 
reference to the particular conditions under which they were obtained. They 
would not hold good (a) of other mental traits, (b) of different modes of assess- 
ment, (c) of a population of different genetic composition, or (4) of a population 
at a different cultural level : much less would they hold good if there were any 
subsequent change (e) in the actual distribution of environmental and genetic 
characteristics, or (f) in the influences affecting their mutual interaction. 

But, when all is said, mere statistical inquiries are not enough. Nothing 
like a decisive pronouncement on the issues we have raised can be looked for 
until more extensive research has been undertaken in the field of psychogenetics. 


62 Cyril Burt and Margaret Howard 


In our opinion an adequate understanding of the processes involved can be 
gained only by carefully planned experiments on lowlier animals, where pure 
strains can be secured and breeding controlled. When we know more about 
the genetics of intelligence in animals, then we may be able to construct with 
greater confidence a more exact hypothesis regarding the transmission of 
intelligence in man. We therefore fully endorse the contentions of a recent 
reviewer who declares that “it is a matter of shame and regret that only an 
amateurish beginning has been made by psychologists in using pure lines in 
fundamental research in the nature-nurture area ” ([36], p. 344). 
the practical decisions we all of us hav 
or matrimonial—are bound to have gen 
here recorded have any validity, 
that the questions involved shoul 


etic consequences; and, if the findings 
those consequences may be so far-reaching 


REFERENCES 
[1] Broan, C. D. (1947). The Mind and its Place in Nature. London : Kegan Paul. 
[2] E Es (1912). The inheritance of mental characteristics, Eugen. Rev., IV, 
—200. 
[3] Bunr, C., and Moonz, R. C. (1912). The mental differences between the sexes. 
J. Exp. Pedag., Y, 273-284. 
[4] Bunr, C. (1921). 
[5] Bunr, C. (1935). 
[6] Burr, C. (1937). 
[7] Burr, C. (1943), 
[8] Bunr, C. (1955), 
XXV, 158-177. 
[9] Bunr, C., and Howanp, M. (1956). The multifactorial theory of 
application to intelligence, Brit. F. Statist. Psychol., IX, 95-131. 
[10] CawPnzLL, F, (1956). Eleven Plus and All That. London : Watts. 
[11] CHESTERTON, G. K. (1922), Eugenics and Other Evils, London : Cassell. 
[12] DARLINGTON, C. D., and Marner, K. (1949). The Elements of Genetics. 
Allen and Unwin. 
[13] Davıp, P. R., and SNYDER, L, H. 
in Social Psychology at the Crossroads. (E 
hil Dus, ie. ed. (1941). Genetics in the Tw 
ISHER, R., A, (1918). e correlation between relatives iti 
Mendelian inheritance, Trans, Roy. Soc. Edin, LII 399-433, ee Ronee i 
[16] FLemine, C. M. (1948). } ? 


dolescence: London: K, Paul. 
i Grass, D. V, (1954). Social Mobility in Britain. London : Longmans ant [m 
[9] oy J; p pe Heredity and Politics. London : Allen and Unwin. 
DON NUN - S. (1946). T e interaction of nature and nurture, Ann. Eugen., 
[20] HazeL, L. N. (1943). Thi ti i i Ihn i 

XXVIII, 476-490] € genetic basis for constructing Selection index 
[21] Hem, A. W. (1954), 


Mental and Scholastic Tests. London: P. S, King. 
The Subnormal Mind. London: Oxford University Press, 
The Backward Child. London : University of London Press, 


Ability and income, Brit. J. Educ. Psychol., XIII, 83-98. 
Evidence for the concept of intelligence, Brit. J. Educ. Psychol., 


inheritance and its 


London : 
Genetic Va. 


es. Genetics, 


Heredity and Intelligence: A Reply to Criticisms 63 


[22] Hocpen, L. (1931). Genetic Principles in Medicine and Social Science. London : 
Williams and Norgate. . 

[23] Hocsen, L. (1933). Nature and Nurture. London: Williams and Norgate. 

[24] Lewis, D. G. (1956). The normal distribution of intelligence: a critique. — Brit. 
J. Psychol. (In press.) 

[25] Luss, J. L. (1947). Family merit and individual merit as bases for selection. Amer. 
Nat., LXXXI, 241-261, 362-379. 

[26] Mappox, H. (1956). Nature-nurture balance sheets. Brit. J. Educ. Psychol. (In 
press.) 

[27] Martin, F. M. (1947). Home background and selection for secondary education. 
Eugen. Rev., XLVIII, 195-202. 

[28] Martuer, K. (1949). Biometrical Genetics: The Study of Continuous Variations. 
London ; Methuen. 

[29] Mites, T. R. (1957). On defining intelligence. Brit. J. Psychol. (In press.) 

[30] Pearson, K. (1900). Grammar of Science. London: A. and C. Black. 

[31] Pearson, K. (1906). Nature and Nurture. London; Dulau. 

[32] Pearson, K., and LEE, ALICE (1903). On the laws of inheritance in man. I. The 
inheritance of physical characters. Biometrika, II, 357-462. 

[33] Reeve, E. C. R., and WappiNGTON, C. H. eds. (1952). Quantitative Inheritance. 
London: H.M. Stationery Office. 

[34] Sımon, B. (1953). Intelligence Testing and the Comprehensive School. London: 
Lawrence and Wishart. 

[35] Surru, Farko (1937). 


Eugen., VII, 240-250. 
[36] STONE, C. P. (1947). Methodological resources for the experimental study of innate 


behaviour as related to environmental factors. Psychol. Rev., LIV, 342-347. 
[37] Tuowsow, G. H. (1948). The Factorial Analysis of Human Ability. London : 


London University Press. 
[38] Vernon, P. E. et al. (1950). 


London: Murray. 
[39] Watson, J. B. (1931). Behaviorism. London: Kegan Paul. 
[40] WnerHAM, D., and C. D. (1909). The Family and the Nation. London: Murray. 


[41] RovAL COMMISSION ON PopruLATION (1941). Memoranda presented to the Commission. 
London: H.M. Stationery Office. 


A discriminant function for plant selection. Ann. 


ap. The Advancement of Science (Brit. Assoc. Ann. Rep.). 


64 


BOOK REVIEWS 


The Organization of the Cerebral Cortex. By D. A. SHoLL. Methuen, 1956. 
Pp. 16+125. 18s. 


i b à 
This is a book which should be read not only by the neural aaa by every serious 
student of psychology. Dr. Sholl’s main contention is expressed in his concluding words i 
“whether the cortex is studied by the anatomist, the physiologist, or the psychologist, 
the model employed should be based on the concept of probability ". It must therefore 
be derived from statistical hypotheses and be discussed in statistical language. " 

He begins with a review of the chief histological and experimental methods available, 
and illustrates the various microscopic techniques with a series of photographs revealing 
a high degree of technical skill. On the basis of his own investigations he then criticizes 
the familiar maps of the cortex which are reproduced in nearly every textbook and are 
cited as demonstrating structural differences between various regions of the brain corre- 
sponding to their supposed differences in function, As Dr. Sholl points out, even the best 
of such maps were the outcome of subjective criteria and manifestly influenced by the 
preconceived ideas of the cartographers, particularly their “ rather crude psychological 
theories". Few recognized “ the enormous variability that exists between individual 
human brains". Many of the differences, as Bok showed twenty years ago, arise merely 
from irregularities of growth, differences in local blood supply, and above all the varying 
amount of deformation produced by the curving surface of the brain. 

Dr. Sholl himself holds that the characteristic chiefl 


E cally homogeneous medium, Dr. Sholl inclines, if anything, 
towards the second : ‘ neural Integration must co: 


nsist of interaction, 
not of conduction over relatively isolated paths ". Meque Dues of din 
, In his discussion of mental abilities like ‘ memory ’ or ' intelligence ’, he unhesitatingly 
rejects the earlier types of hypothesis that tried to allocate them to “ circumscribed regions '- 
Intelligence ', he holds, like * retentivity ', must be a general attribute of cortical activity, 
resulting from its over-all mode of organization. 
The evidence for these various conclusions is lucidly expounded, and supported by 
P9, cud PESE owever, as he explains, “ the statistical mode of description 
as been adopted, not because of the technical di ies i i individua 
pce) DEN gos the technical difficulties involved in the study of individu: 


r ; appropriate method to adopt for the ; of a system whose 
Interactions can only be described in terms of probability " studs 9 CyRIL BURT. 
s i 


Psychological Tests and Personnel Decisions. 


‘ests: : B , iG. C. 
GrLrsER. University of Illinois Press, 1957. Pp. aie? ^ - 
Most of those who have written on the theory of 


efficiency of any selection procedure may be judged 


coefficients, regardless of the decisions to be taken. Education committees, however, 
especially in the early days of mental testing, were apt to inquire whether the increase 
information so gained would justify the additional cost. Dr. Cronbach has set out to 
answer the questions thus raised by developing what Wald has taught us to call ‘ statistical 
decision theory’. The book which he and his collaborator have produced contains an 
admirable account of the construction and use of mental tests Eom this highly practical 


standpoint. It should be in the hands of all who «oi i T 
penu, aG all who work in the fields of E do M. 


mental tests have assumed that the 
exclusively in terms of the validity 


OCCUPATIONAL PSYCHOLOGY 


Editor : ALEC RODGER 


April, 1957 Volume 31, No. 2 
"m 
Knowledge of Results and * Skilled Performance ’ JoHN ANNETT and Harry Kay 
Interrelation of Attitudes and Communications in a Sub-Divided Working Group R. A. M. GREGSON 
The Effect of Initial Pointer Position on Display-Control Relationships D. H. HOLDING 
Simplifying the Controller’s Task through Display Quickening FRANKLIN V. TAYLOR 
The Transition from Work to Retirement (1) MARGARET PEARSON 
Training in Industrial Skills: Opinions of Trainees D. C. S. WILLIAMS 
Training in Perceptual Skills Mary D. ALLAN 


Book Reviews 
Other Books Received 


Annual Subscription 30 shillings 


National Institute of Industrial Psychology, 14 Welbeck Street, London, W.1 


HUMAN RELATIONS 


Contents of Vol. X, No. 2 


ARTHUR N. TuRNER. Foreman, Job, and Company 


James Spituus. Natural Disaster and Political Crisis in a Polynesian Society: an Exploration 
of Operational Research. IT m 


ARNOLD S. TANNENBAUM, ROBERT L. Kann. Organizational Control Structure: a General 
Descriptive Technique as applied to Four Local Unions 


RONALD Tarr. A Psychological Model for the Study of Social Assimilation 

ALAN RICHARDSON. The Assimilation of British Immigrants in Australia 

Erika CuANce. Mutual Expectations of Patients and Therapists in Individual Treatment 
PAMELA BRADNEY. The Joking Relationship in Industry 


30s. per annum or 8s. 6d. per issue 


TAVISTOCK PUBLICATIONS LIMITED 
2 Beaumont Street, London W.1 


i DEUM 
^ «= 

CALOUT AEWSe JOURNAL OF STATISTICAL PSYCHOLOGY 
Vol X Part I 5 May 1957 


CONTENTS 


DUNSDON, M. I., AND A Study of the Pe; 


rformance of 2,000 Children on Four 
ROBERTS, ip A. F. Vocabulary Tests . 9 ` : TEN. . 


BERNYER, G. 5 'Psyċhological Factors: their Number, Nature, and 
Identification Y E 5 É lici 
STUART, A. The Comparison of Frequencies in Matched Samples , 29 
Bort, C., AND ‘Heredity and Intelligence: A Reply to Criticisms ^ . 33 
Howarp, M. ; e 
Book Reviews D. Mag. doa DOE. ee RUM ZEN quM 
- 
i $ Tech See 
* 
Ls 
t JOURNAL COMMITTEE 
Ve Cvur. Burr Editor 
; CHARLOTTE BANKS and ALAN AN Assistant Editors? * 
J. W. Wirren Managing Sub-editor , 
LI [2 a 
and 
M M. Foss J. S. Smart 
- Haminton J. Surmrtanp 
* A. HERON - 


F. W. WARBURTON 


MIS A ^ Xu t » s P f ' 
Vol X Part II. (3) November 1957 
* 
^ 


à THE 
BRITISH JOURNAL 
os» OF ; 


STATISTICAL E 


PSYCHOLOGY , 
« 


: bacs _ EDITED BY 
CYRIL BURT 


" WITH THE ASSISTANCE OF 


CHARLOTTE BANKS AND ALAN STUART 


AND THE FOLLOWING EDITORIAL BOARD 


H 
JA. C. AITKEN E. A. PEEL - P 
M. S. BARTLETT L. 5. PENROSE ? 
W. G. EMMETT J. FRASER ROBERTS 
M. G. KENDALL A. RODGER 
D. N. LAWLEY W. STEPHENSON 
E. S. PEARSON, P. E. VERNON 


Managing Sub-editor J. W. WHITFIELD 
e 


i? ; Printed and Published by 
TAYLOR & FRANCIS LTD. 


RED LION COURT, FLEET STREET, LONDON, E.C.4 


Price 20s. per part (U.S.A. $3.50) Subscription 30s. 6d. per volume (U.S.A $4.50) 


$ 


issued in two parts each year, 


Lro., 18 Red Lion Court, Flee: Street, London, E.C.4. m 


"n Society also issues 
of 7 


receive the foregoing journals o 


British Psychological Society. 


eadings in italic, correspondi 


l P 
numbering, etc, (particularly in the case of tab 
1n correcting Proofs authors si 


The Print, 


Authors? set out in 
Batey (Oxford Unive: 


athematics ^ (Arthur Phill 


tsity Pre: 


lip: 


quarterly 
* subscriptions, 


o the Editor, advertisements to the 


Whether they wish 
sending Out proofs, 


Secretary, British 


London, W.C.2. Members of the Society 
uiries should be addressed to the Secretary, 


. haundy, P. R. Ba 
), or to those given in the 
» iv, 1956), 


ns fo i the cost of setting up tables and 
Omposition is ye muc! of 
is r 


Breater than the cost 


ving 


rrect Spelling, grammar, capitalization, 
Preparing manuscripts and 

D ar as Possible, with the ; Recommendations to 

ing of Mathematics by T 

ss, 1954. ch. II, pp. 21773 


s, Monotype Recorder, XL 


rrett, and Charles 
Paper on * Setting 


algebraic 


machine Composition. 


in De printed ag economically 
formulae can, with a little si 


simplification, be 
many tables and 


O pay for the additiona] 


but no Page proof, 
ves are responsible, 


Vol. X The British Journa Of Statistical Psychology November 
Part II 1957 
PROFESSOR LEWIS M. TERMAN oj 


Readers will have heard with deep regret of the death of Professor Terman. 
In this country he was perhaps best known for his revisions of the Binet-Simon 
scale. In the days when mental testing was still looked at askance by the 
majority of psychologists and educationists, he perhaps more than anyone 
succeeded in giving it a scientific status and putting it into practical shape. 
As he himself points out, he had no wish to contribute to statistical theory or 
to the principles of mental measurement, but only to make use of them. And 
by his own example he became one of the most effective advocates of metrical 
techniques, and did much to convince both teachers and administrators of the 
practical value of the new statistical procedures. 

His origins, like that of so many eminent scientists, were curiously mixed. 
He was, as he himself puts it, “ Scotch, Irish, Welsh, German, and French ”. 
He was born in a country district about seventeen miles from Indianapolis 
on January 15, 1877. From data given by J. M. Cattell in American Men of 
Science, we learn that in a group of 1,000 scientists Terman was the only one 
belonging to a family of twelve or more. He was in fact the twelfth of fourteen 
children ; and his father was one often. The Termans, Tearmans, or Turmans 
were a roving and prolific family ; and his branch of it had apparently migrated 
Westward from Virginia. Most of them, like his grandfather and father, were 
hardworking and prosperous farmers. Nevertheless, the household library 
contained something like two hundred volumes : and the young Lewis apparently 
found books more interesting than the farmyard. Among his favourites he 
mentions Dickens, the Encyclopaedia Britannica, and Peck's Bad Boy. 

There was, he says, nothing in his early environment that could have 
** conditioned him specifically in favour of psychology ". However, “almost 
as far back as I remember, I seem to have had rather more interest than the 
average child in the personalities of others, and to have been impressed by those 
who differed from the common run”. His elementary education was acquired 
in “a little red one-room school-house, which could not boast a single library 
book". From eleven to eighteen he worked on the farm for about six months 
each year ; and “ any mental development that occurred during this period must 
have been due to maturation rather than to intellectual stimulation ”. 

At school, and later on at college, he preferred linguistic subjects to mathe- 
matics and physical science. In this respect he notes a marked contrast between 
himself and another celebrated member of his class, Arthur Banta, the biologist. 
“ Banta,” he says, “ was less bookish and less interested in the traits of people, 
but far more interested than I in plants, animals, and rocks." As it happened, 
both his school and his neighbourhood contained few boys of his own age. 
Most of them were older. And this, he thinks, may have provided ground for 
ority attitude and tendencies to introversion," which even in adult 


* the inferi 
ernreuter tests. A disposition of thi 


years he still revealed in the B 


— 


Sie Bureau of £a»! & Psyl Research 


66 Professor Lewis M. Terman 


thinks may be somewhat characteristic of the psychologist, since it inclines a man 
towards introspection, and makes him secretly curious about the abilities and 
motives of those around him. Even before he reached his teens, he became 
fascinated by after-images, by the flight of colours, and by the connections 
between one idea and the next. He mentions an early discovery, which recalls 
a similar experience of Tennyson. “ When about eleven or twelve years of age, 
I hit on the fact that, by repeating a formula, I could lose my sense of identity 
and my orientation in time and space : my method was to gaze fixedly on some- 


thing, and repeat: ‘Is this me?’ until all distinction between subjective and . 


objective was lost in a kind of mystical haze." 

While he was still only ten years old, “ a pedlar who sold books on phren- 
ology stopped at the farm for a night, discoursed on his science, and felt the 
bumps of each of us... When my turn came to be examined," says Terman, 
“he predicted great things of me”. That, he believes, may have stimulated 
him to greater efforts. And the interest in phrenology which was thus implanted 
served as his first introduction to the science of individual differences. 

When he reached the age of fifteen, he went to a * normal college ' to prepare 
for teaching. Of the various psychological textbooks that he studied, he 
"liked Sully” best. James’ Principles he did not discover until later, But, 
he adds, “ of all the founders of modern psychology my greatest admiration was 
for Galton ”, At twenty-one he became principal of a township high school, 
where he “ taught the entire curriculum to about forty pupils”. A year later 
he married. His wife relates how he once told her that his “ interest in their 
baby had determined him to become a psychologist ". In 1901 he went to 
Indiana University ; and a couple of years afterwards to Clark University—in 
those days the Mecca for almost every aspiring young psychologist in America. 
He was one of a group of brilliant students, twelve of whom later became suffi- 
ciently distinguished to be included in the American Who's Who. 

For a while he continued under “ the hypnotic sway of Stanley Hall ". 
But he soon began to feel that the value of Hall's questionnaire procedures was 
REY bui His own paramount interest still'remained centred on 
individual differences, and more especially on methods of studying the more 


mentally deficient. He l essays 
which he prepared for seminars—on ‘ y recalls two y 


f T E ESR » a8 with Thorndike, his mind ma ned 
in this direction partly because he “ had no aptitude for, and (ile inane in, 
the laboratory techniques of the experimentalist of the Wundt-and-Titchener 
type". Stanley Hall strongly disapproved. Terman, however, started to 
read whatever he could find on the subject, whether in English, French, or 


German: he mentions more articul i 
above all Thorndike. One thing, ea a = Del le ee 


he fi 
University did not offer, namely, “ € found, was badly needed and that Clark 


: à instruction in statistical ? if a year 
with Thorndike would have been an untold boon ” mee ae 


| 
| 


Professor Lewis M.- Terman 67 


Unfortunately his studies were interrupted by a pulmonary haemorrhage 
and the discovery of a mildly active tuberculosis. On recovering, he was 
offered the chair of Child Study and Pedagogy at Los Angeles. Here he 
still had to nurse his health. But four years later in 1910 he was appointed 
professor of educational psychology at Stanford. 

The first fruit of his lectures at Stanford was the small textbook on The 
Hygiene of the School Child. Huey, however, who with Gesell had for long 
been one of his closest friends, urged him to start an investigation on the value 
of the Binet scale of tests which had already been used for work on defectives 
by Goddard, Kuhlmann, and others. Accordingly, in collaboration with one 


of his graduate students, H. G. Childs, Terman began an experimental study 


of the 1911 scale which eventually led to the publication of The Measurement 


of Intelligence in 1916. 
Burt started his investigations on much 


It was during the same period that 
the same problem ; and it is curious to observe how both of them, working to 
begin with, it would seem, with no knowledge of each other’s interests, studied 
the same writers and adopted much the same lines of approach. Both Terman 
and Burt record their indebtedness to Galton’s work in Britain, Meumann’s in 


Germany, and Binet’s in France. Owing, however, largely to the influence of 
Pearson, Burt became interested in the quantitative problems ; Terman, on 
the other hand, related that his “ own interest in mental tests at that time was 
more in their qualitative than in their quantitative aspects » But both appear 
to have reacted in much the same way to Spearman’s methods and conclusions. 
Terman describes how, on reading Spearman’s paper in the American Journal 
of Psychology, he was greatly impressed with “ the dogmatic tone of the author, 
the finality with which he disposed of everyone else, and his one-hundred-per- 
cent faith in the verdict of his mathematical formulae "; but, he adds, * the 
conclusion to which it all led, namely, that there is ‘a correspondence between 


lled General Discrimination and General Intelli- 


what may provisionally be ca 
gence which works out with great approximation to ove or absoluteness’, seemed 


to me as absurd then as it does now ". 
The book on The Intelligence of School Children and the monograph on the 


Stanford Revision data were “ by-products of the work that led to The Measure- 
ment of Intelligence”. Asin Britain so in America, “many of the old-line psychol- 
ogists regarded the whole test movement with scorn ”; “hence ”, says Terman, 
he himself * was a little surprised that his publications were so favourably 
received ". Indeed, even at the time, he “ imagined the revision would probably 
be displaced by something better ". And so eventually it was, namely, by 
‘the new revised Stanford-Binet tests’ produced by Terman in collaboration 
with Miss Merrill in 1937. British investigators who either then or later on 
were trying to modify or standardize the original Binet-Simon scheme, so as to 
render it more suitable for use in this country, will always remember with 


gratitude the generosity with which Terman permitted them to borrow or adapt 


either of his own revisions. 


E2 


68 Professor Lewis M. Terman 


In the meantime, as soon as the first world war was over, Terman devoted 
his attention to the standardization of group tests of ability and scholastic tests 
for assessing educational achievements. 
Commonwealth Fund enabled him to u 
always had in mind on 'the Psychology of gifted individuals *. The result 


was the three large volumes, published between 1924 and 1930, under the title 
of Genetic Studies of Genius. A few ye 


nd Personality, based largely 
', and published in 


is heart, “The Discovery and Encouragement 


well worth quoting. ' What I especiall 
that the evidence on early mental develo 
follow-up of gifted subjects selected by m 


early saw, more 
entality testing, and have Succeeded in 


cir competitors, and, by the application 


; 
h l c € Ss and the major 
differences in the intelligence test Scores of certain races, will never be fully 


These conclusions plainly 
ave i the bounds of Psychology; and, if they are gradually 
winning acceptance, that is largely due to the long and tireless labours of 
Professor Terman himself, 


WILLIAM B, Lewis, 


"8 


Vol. X The British Journal of Statistical Psychology November 
Part II 1957 


THE RELATIONS 
OF THE NEWER MULTIVARIATE STATISTICAL METHODS 
TO FACTOR ANALYSIS 


By HanoLp HOTELLING 
University of North Carolina 


Abstract. A survey of developments in multivariate analysis during the last thirty 
years shows that some, though not all, of the purposes for which factor analysis 
has been used may now be better accomplished by other procedures, e.g. by > 
regression, multiple correlation, the study of relations between two sets of 


variates, or of the dimensionality of a set of variates. 
To determine whether two or more groups of persons differ significantly 


in their mean values or their covariance matrices, the most appropriate procedures 

consist of methods of multivariate analysis of variance. Such methods have 

increased rather than diminished the advantages of using an external criterion 
instead of making a purely internal analysis. d 

imating the dimensionality of a continuous multivariate 

nd exact answer : the rank of 


The problem of estim: 
of a sample admits a simple a 
ded the number of degrees of 
s the number of the variates. 


al to that of the sample, provi 
ample exceed: 
If, however, the problem is re-interpreted so as to imply that each observation 
is the sum of a real part and a random error, then the hypothesis that the real 
part has a rank smaller than the number of variates can be tested only when a 
proper estimate of the errors is available, i.e. by making suitable replications. 
The appropriate method is again multivariate analysis of variance rather than 
factor analysis. Ina third form of the problem the object is, not to ascertain a true 
dimensionality, but to assess the error resulting from discarding (on practical 
grounds) a dimension that may really exist. Here the problem is somewhat 
indeterminate: a practical expedient may involve a special form of factor analysis, 
viz. the calculation of the least, not the greatest, latent roots. 

Thus, in many cases in which they have hitherto been used, factor analyses 
of the usual kinds are inferior to other procedures : nevertheless, the results of 
such analyses may have heuristic and suggestive value, and may uncover 
hypotheses which are capable of more objective testing by other methods. 


population by means 
the population is equ 
freedom among the individuals in the s: 


I. Recent DEVELOPMENTS IN MULTIVARIATE ANALYSIS 
Factor analysis in psychology, and the more general study of multivariate 
analysis in statistics, have developed side by side and to a great extent indepen- 
dently. Owing to lack of adequate communication between the two groups 
of research workers—each spread geographically, but having few contacts 
with the other—psychologists have o 


ften employed statistical methods demon- 
strably inferior to those worked out by mathematical statisticians, while the 


ORE Harold Hotelling 


latter have often been unaware of the highly interesting activities and stimulating 
problems of psychologists. The following paper outlines achievements in 
multivariate statistical analysis during the last quarter-century, suggests uses in 
psychology of the newly discovered methods, and tries to mark out à boundary 
between the cases in which these should supersede methods involving factor 
still the most useful tool. It will 
people have undertaken to do by 
ys. The methods here considered 
multivariate normal distributions ; 


ive, suggestive, and a roximative 
> PP 


have all been derived primarily for use with 
in other cases they still have certain descript 


(i) A primary tenet of modern stati 
by known probabilit 

andin sampling. (ii) Inselecting statisti 
must maximize the probabilities of rj 


ndations of theoretical Statistics’ [5]. It 


f descriptive statistics is logically posterior 


istical inference, though the practice of descriptive statistics 
comes normally before the inference. (iii 


(iii) Thirdly, the statistician has abandoned 
of merely analysing whatever was put on his doorstep by 
other investigators or by chance, and now has much to say about how experiments 
Should be designed to make inferences from them valid and the whole process 
economical. 


The first Principle, calling for know 
ingenious research in methods of findin 
something more fundame 
known numbers wherey 
parameters that c. 
‘Student’ distri 


n probabilities, has stimulated much 
g probabilities; but it has also done 
ntal in stipulating that the probabilities are to be 


er possible, and not merely functions of unknown 
It w 


chosen ‘statistic’ 
If 5 is at all large, 
as is usual, none of them is k 


and the distrib 
bservations, may involve all of them. 


m 
pr 


m 


Multivariate Methods and Factor Analysis 7i 
which I have termed ‘nuisance parameters’, entering into the most relevant 


probabilities. 
This need of keeping down the nuisance parameters limits drastically the 


statistical methods appropriate for use with multivariate normal distributions, 
going beyond the second, or ‘ efficiency’ tenet, which leads to the use of sufficient 
statistics where they exist, as they do the multivariate normal distribution. 
The sufficiency criterion tells us here that we should use no statistic that is 
not a function of the sample means, variances, and covariances. The new 
criterion tells us further that among these functions we should seek to use only 
those in whose distributions the nuisance parameters can be kept under control 
and ideally are non-existent. Many of the powerful methods discovered in the 
last three decades are based on this principle. 

The first example is the multiple correlation C 
distribution in samples from a multivariate normal population was discovered 
by Fisher [6]. It looked at first as if the distribution would involve all the 
covariances as independent parameters, and therefore be excessively complicated 
and difficult to determine as a function; moreover, even if the distribution 
function of R were determined, it would be useless if it involved all these many 
nuisance parameters, since they are themselves unknown. Fortunately the 
situation was saved by the invariance of R under all non-singular linear trans- 
formations of the predictors, or ‘independent variables’. Such a transformation 


can always be found that will make the predictors mutually uncorrelated, and will 
also make all but one of them uncorrelated with the predictand. Thus in the 
n be different from zero, 


transformed system there is only one correlation that ca 
and this equals the population 1 i efficient p. Hence the 


multiple correlation co 
distribution of R involves only p; and Fisher, using this invariance property 
and some ingenious geometry, 


with other tricks, found the exact probability 
density. 

The great principle illustrated by Fisher’s success with the multiple 
correlation ‘coefficient is the possibility of eliminating nuisance parameters 
and simplifying exact probabilities when the statistics used are invariant under 
all non-singular linear transformations of the variables, or even under an extensive 
class of such transformations such as the orthogonal. This puts à premium 
on the use of invariant statistics, and points to the selection of such statistics to 
meet various needs. The criterion of invariance has been influential ever since 
Fisher’s 1928 paper in the development of new methods of multivariate analysis. 


coefficient, whose general 


II. THE GENERALIZATION OF SrupEeNT's DISTRIBUTION 
The manifold uses of the Student distribution in one dimension, for example 
in testing the significance of means and differences of means, and in laying down 
confidence intervals with known probabilities, had long required generalization 
mensions, and various unsuccessful attempts had been made 
Notable was the ‘ Coefficient of Racial Likeness’, proposed 
like other such efforts, foundered on the nuisance 


to two or more di 
in this direction. 
by Karl-Pearson [13], which, 


72 Harold Hotelling 


» With each square divided 
from the data of both samples. E 


The trouble was that this statistic, with enthusiasm in the 1920's, 
had a distribution that was not only i i 


int whose Coordinates £; are 
the true but unknown mean values of th i i 
be accomplished as follows. 
where the summati 


[55] to get (hs, 


Defining n= N 
on is over the sample, we in 


-l sy = S(x; Ax; — &j)n, 
and consider the quadratic fo 


rm 

T= N P hie - 5); — £). 

Pression is invari singular linear trans- 

ossible to obtain its 

, easily transformable to the 
Since tables of the 

m them the numerical 

= pandn, = N—p+l 


istics and Ma 
general formula for 72 is 


Multivariate Methods and Factor Analysis 73 


where the sample numbers are N and N’. The distribution is the same as before, 
with n = N+.N’—2; this x is also the denominator of each of the covariance 
estimates, which are formed by pooling corresponding product sums in the 
two samples. 

S. S. Wilks [17] studied certain generalizations of the analysis of variance 
to p dimensions, using determinants and ratios of determinants having in- 
variantive properties under linear transformations. Other types of multivariate 
analysis of variance have since been investigated by S. N. Roy, P. L. Hsu, 
M. S. Bartlett and others. These are suitable, among other things, for testing 
significance of differences among three or more groups, representing for example 
different treatments, and generalize the use of T as the ordinary univariate analysis 


of variance generalizes the use of the Student t. 
If in the formula above for 7* in terms of the 8, the latter do not all have 


zero expectations, the distribution is of a more complicated form which was 
obtained by R. C. Bose and S. N. Roy [4]. Asin the case of the multiple correlation 
coefficient, the invariantive property made it possible to show that only one 
parameter enters into the distribution, and enabled Bose and Roy to obtain the 
probability element as a multiple of a confluent hypergeometric function. When 
the à; are suitable multiples of the differences between corresponding sample 
means in two populations, the parameter A? in the distribution of T is a quadratic 
form of the same type as T", with population instead of sample means and 
covariances; A may then be called the figurative distance between the two 
populations if these have the same covariance matrix, and possesses the properties 
of distance, including the triangular inequality. The Pearson Coefficient of 
Racial Likeness should be replaced by T? for samples, by A? for populations. 
Much empirical work using this modernized coefficient of racial likeness has 
been carried out in Bengal under the leadership of P. C. Mahalanobis. For 
unequal covariance matrices in the two populations and matched samples, 
T may be used in an obvious way generalizing Student’s celebrated method of 


comparing hours of sleep gained by using two isomers of hyoscyamine 
hydrobromide. 


ILI. ‘THE RELATIONS BETWEEN Two SETS OF VARIATES 

A different kind of invariance is involved in the determination of relations 
between two sets of variates, as in my paper in Biometrika [9]. One set might 
consist for example of scores in several mental tests given to army recruits, and the 
other of scores on various kinds of subsequent performance. The determina- 
tion of a linear function of the test scores and a linear function of the later 
performance scores, such that the correlation between these two linear functions 
is as great as possible, leads to a determinantal equation whose greatest root is 
the required maximum correlation. The other roots determine other matching 
pairs of linear functions, such that each of these linear functions of the variates 
in one set is uncorrelated with all but its own partner in the other set. These 
functions are the canonical variates ; and the correlations between matching pairs 


74 Harold Hotelling 


are the canonical correlations between the two sets of variates. If the canonical 
correlations are all zero, the two sets are completely independent; and it is 
useless to predict performance by means of any of the tests used. If one of the 
canonical correlations is unity, there is. something about the performance, 
represented by a certain linear function of the performance measures used, that 
can be predicted perfectly by means of the tests. If two canonical correlations 
are unity, there is a whole one-parameter family of types of performance that 
can be predicted perfectly by means of these tests. 

À use sometimes made for factor analysis in the past is in testing for the 


relations between two sets of variates. One study of the relations of character 
with mentality, for example, used seve 
character traits by acquaintances, 


seven character factors. A name was applied 
ound; and a plausible matching of character 
ged so as to get seven pairs. The correlations 
re then computed, and judged in each case 


tor analysis should clearly be superseded by an 
relations between the two sets of variates. The 


ce of a relation between the two sets if one actually 
exists. The other canonical correlations, in decreasing order of size, and the 


Corresponding canonical variates, will help to elucidate the nature of any relations 
that may exist between character and men 


mes to resist the temptation to perform 
regression problems, in which a least- 
ora single variate y is to be determined by variates x, ..., Xp. 
f related variates, in which the number 


nonical correlations to zero. If as a 
ctor analysis on x, ..., X,—let us say a 
the resulting ‘factors’, or components 
same multiple correlation with y as did the 
n for y with them as predictors will for 
ion as if the factor analysis had not been 


Ving is trifling as compared with the heavy 
: i - (Her | i 
Bicuade, including the pes . (Here of course exceptions must 


this labour virtually to Zero; 


Multivariate Methods and Factor Analysis 75 


în the regression analysis, the results will be virtually affected. If even the last 


he one making the least contribution to the total variance, 


principal component, t 
is omitted, both the regression function and the multiple correlation coefficient 


may be drastically different. 
To see this, consider the case of two predictors, X, and x,, which with the 
predictand y are represented by three vectors through the origin in the sample 

he cosine of the angle 0 made by the y-vector 


space. The multiple correlation ist 
with the plane of the two «-vectors, and the regression values are represented 


by the vector Y within this plane making a minimum angle with y, and therefore 
the orthogonal projection of on the plane. Replacement of the two predictors 
by their two principal components or other ‘factors’ is equivalent to rotating 
each of the x-vectors within their common plane, a process that does not affect 
the regression vector Y nor the angle 6, since these depend only on the plane and 
on the external predictand (or criterion) y. But suppose only one of these 
‘factors’ is used. The plane now collapses into one of the lines in it; and the 
regression vector is then forced to lie along. this line, which is not determined 
by y, butonly by the xs. Thereis nothing to prevent, or even to make improbable, 
a great increase in the angle of the new regression vector with y, with a corre- 
sponding decrease of the correlation and of the accuracy of prediction. It is 
possible for even a high correlation to be reduced to zero when the set of predictors 
is replaced by a reduced number of linear functions of them, such as might be 
obtained by factor analysis, and in virtually every case there will be some decrease 
of correlation and of the accuracy of the prediction formula obtained. 
uccessful determination of exact probability distributions for the 
tion, canonical correlations, the generalized analysis of variance 
and Stu t atio, and certain other modern methods has in each case hinged 
on the invariantive property of the statistics used. It is apparently only by 
restricting the choice of statistics to such invariants that the appalling number 
of nuisance parameters in the distributions of functions of numerous correlated 
variates can be reduced sufficiently to bring the sampling errors under control. 
The necessity of sampling distributions in drawing any reliable inferences 
thus points to the use only of functions of observations possessing invariantive 
properties, since no exact sampling distributions seem to be obtainable otherwise. 
It is true that asymptotic standard errors are used in default of exact distributions ; 
but in the absence of any known bound for the error resulting from this practice, 
it cannot be recommended excepting in desperate extremities. Far better is 
the use of statistics for which exact distributions can be determined, ‘exact’ in 
the rather technical sense of freedom from any unnecessary dependence on 
nuisance parameters, and ‘exact’ also in the ordinary sense of accurate 
computability. 
In a matrix of covariances the only thing invariant under all non-singular 


linear transformations of the variates is the rank. For covariance matrices of 
ample from a multivariate normal or any other 


observations, constituting à S 
continuous distribution, the actual rank is virtually always equal to the number 


76 Harold Hotelling 


In the latter category is a restriction to orthogonal transformations. Under 
i re invariants, the latent roots 


correlations, 
When we come to the comparison of tw 

invariants under all non-sing i 

For a pair of symmetric 

non-singular, a complete set of invari 

the determinantal equation 


[4 —AB| = 0, 
each with an associated vector. No restriction to orthogonal transformations is 
now needed to get meaningful invariants. 


D ++) A, involves 
es p, and the numbers of 
i This distribution 
was determined virtually simultaneously by S. N. Roy [14], [15], P. L. Hsu [12], 
and R. A. Fisher [7]. To test the h i 
Covariance matrices for th 


select some function of As 3 A, that will be Sensitive 


samples, or where 


extension of the Separation in univariate analysis of 
of squares from the rest. The familiar univariate 

of main Concern to the error sum of Squares, multiplied 
of degrees of freedom, is generalized, i i 
estimates, by either AB- o- BAA, E 


gu 


Multivariate Methods and Factor Analysis 77 


by one number in order to carry outa test of significance to see whether A represents 
deviations substantially beyond what can be expected by chance of the sort 
revealed by the discrepancies between replications summarized in B. This 
number may be the determinant 
|AB>| = |B14| = |A]/LB| = Xy Àz -> Àp» 

whose distribution was studied, with this use in view, by S. S. Wilks [17]. It 
is the statistic indicated by the likelihood ratio criterion. "The use of the greatest 
of the roots has been developed by M. S. Bartlett [1] and later papers [2, 3], and 
S. N. Roy in the multivariate analysis of variance [16], and presents advantages 
in numerous applications. 

Another method makes use of the sum of the roots. This arose originally 
from use of the generalized Student ratio T* to measure the deviation of a bomb 
from its target, against a background of recorded errors in range and deflection 
of such bombs. I have written about this distribution in three symposium 
volumes: uses of it in Techniques of Statistical Analysis (1947), its mathematics 
in the Proceedings of the Second Berkeley Symposium [10], and its relation to 
other multivariate methods in Statistics and Mathematics in Biology [11]. The 
joint distribution of the squares of the canonical correlations is closely related 
to that of Àp ..., Àj, and has been studied by P. L. Hsu in Biometrika for 1941. 

Problems of discrimination and classification among populations and 
individuals have been considered by many writers. Factor analyses appear to 
have little place in this problem, though it is occasionally mentioned. A. Wald 
proposed (Annals of Mathematical Statistics, 1944) a statistic for measuring the 
extent of misclassification, and with the help of its invariantive properties reduced 
its distribution to a triple integral. This was studied further by H. L. Harter 


(Ann. Math. Stat., 1951). 


IV. Tue DIMENSIONALITY OF A MULTIVARIATE NORMAL DISTRIBUTION 


Let us consider one further problem of long standing that helps to bring out 
the relations between factor analysis and methods of multivariate analysis of 
recent development. This is the question of the dimensionality of a multivariate 
normal distribution. The dimensionality is the same as the rank of the matrix 
of the observations, the covariance matrix, and the correlation matrix. Frequently 
asked questions are how, on the basis of a sample, to determine this rank, and 
how accurate the determination is. The answers are, in one sense, easy. The 
rank of a sample cannot be greater than that of the population from which it is 
drawn. For example, if a distribution of points in three-dimensional space 
is confined to a plane, then every sample of these points lies in the same plane. 
On the other hand, for a distribution having continuous positive density of 
dimensionality p, the probability that a sample of more than p points should lie 
entirely within a flat space of dimensionality less than p is zero. If the sample 
consists of p or fewer points, the dimensionality is one less than the sample size. 
Thus, apart from possibilities whose total probability is zero, we may say that 


78 ~ Harold Hotelling 


if there are more individuals than variates, the rank of the sample matrices is 


exactly that of the population. This is a complete answer to the questions as 
they are most often asked. 


This answer is however unlikely to satisfy our questioner. The rank of an 
empirical matrix of observations (or of their correlations or covariances) is 
practically always equal to the number of variates; and he wants to represent 
these by a smaller number of variates, and is looking for a statistician’s blessing 
on this labour-saving procedure. His real problem is not whether in reality 
there are fewer dimensions than variates, but whether he will make much of an 
error if he proceeds as if there were. It is of course impossible to give him 
satisfactory assurances without knowing exactly what it is that he is ultimately 
going to do with his calculations, and even if one does know this it is difficult. 
There is a chance of considerable mischief ultimatel 
since so much has to be neglected, it is not 
the method of principal 


to estimate this matrix 


that independent estimates shall be available of the same true part. If g indepen- 
dent estimates, ject ` i 


peak of ‘components’ 
' if possessed of certain 


ship cannot 
hatever mathematical 
of valid applications 
mination of genetics. 


out genetic factors, w 
trix. "Thus the scope 
st be reduced by the eli 


translation published in full in the 
author is indebted to the authorities 


g t 


we 


Multivariate Methods and Factor Analysis 79 


of the Centre National de la Recherche Scientifique for permission to publish an 
English version here. 


REFERENCES 

[1] BanrLETT, M. S. (1934). Probability and chance in the theory of statistics. 
Proc. roy. Soc. A, CXLI, 518. 

[2] BamrtETT, M. S. (1948). Internal and external factor analysis. Brit. J. stat. 
Psychol., 1, 73. 

[3] BanrLETT, M. S. (1950). Tests of significance in factor analysis. Brit. J. stat. 

* Psychol., III, 77. 

[4] Bose, R. C., and Roy, S. N. (1938). The distribution of the studentised D*- 
statistic. Sankyā, IV, 19. 

[5] Fisuer, R. A. (1921). On the mathematical foundation of theoretical statistics. 
Phil. Trans. roy. Soc. A, CXXII, 309. 

[6] Frsurm, R. A. (1928). The general sampling distribution of the multiple correlation 
coefficient. Proc. roy. Soc. A, CXXI, 654. » 

[7] Fisurm, R. A. (1939). The sampling distribution of some statistics obtained from 
non-linear equations. Ann. Eugen., IX, 238. 

[8] HorgtLING, H. (1931). The generalization of ‘ Student’s’ ratio. Ann. math. 


Statist., II, 360. 
[9] HorzLL1ING, H. (1936). Relations between two sets of variates. Biometrika, XXVIII, 


321. 

[10] HorzLLING, H. (1950). Ap. Proceedings of the Second Berkeley Symposium. University 
California Press. 

[11] HorzLLiNG, H. (1954). Ap. Statistics and Mathematics in Biology. lowa State 
College Press. 

[12] Hsu, P. L. (1939). On the distribution of roots of certain determinantal equations. 


Ann. Eugen., IX, 250. 

[13] Pearson, Kart (1926). On the coefficient of racial likeness. Biometrika, XVIII, 105. 

[14] Roy, S. N. (1939 a). A note on the distribution of the studentised D*-statistic. 
Sankya, IV, 373. 

[15] Rov, S. N. (1939 b). p-statistics, or some generalisations on analysis of variance 
appropriate to multivariate poblems. Sankya, IV, 381. 

[16] Rov, S. N. (1942). Analysis of variance for multivariate normal populations. 


Sankya, VI, 35. 
[17] Wixxs, S. S. (1932). Certain generalizations in the analysis of variance. Biometrika 


XXIV, 471. 


iM 


- 


Vol. X 'The British Journal of Statistical Psychology November 
Part II 1957 


THE DISTINCTION BETWEEN COMMON AND SPECIFIC 
VARIANCE IN FACTOR THEORY 


By CHARLES WRIGLEY! 
University of California 


Abstract. The purpose of this paper is to indicate certain inadequacies in the 
current use of * communalities’ in factor theory. "There are considerable diver- 
gences in the way the term is defined ; and several reasons for believing that, 
as ordinarily computed, the values become highly unstable when the number of 
tests or the size of the sample is increased. It is accordingly suggested that a 
better measure of the common factor variance might be furnished by substituting 
the square of the multiple correlation. 


I. Tug Common FACTOR THEORY 

Factor theory assumes the variance of any test to be divisible into three 
parts : (i) variance which is common to other tests as well; (ii) variance which is 
reliable but specific to that test; (iii) error variance which arises from test unre- 
liability. Specific and error variances may be combined to form the unique 
variance. This threefold distinction was introduced because of the unusual 
limitations shown by most psychological measurements. The psychologist 
naturally wants to exclude from his analysis error variance of the kind revealed 
by changes in scores from one occasion to another, or from parallel test forms 
used on the same occasion; and the reliability coefficient might seem the 
appropriate value to insert in the diagonal in order to achieve this. But he also 
seeks to exclude specific variance from his analysis, since, unless reliable variance 
can be shown to be present in two or more tests, it may conceivably represent 
something peculiar to one particular test only. These two contributions to the 
total variance—the specific and error variance—not only introduce further factors 
of no psychological interest, but may even become confounded with the common 
variance, thereby confusing the description of factors having psychological 
relevance. Hence it is commonly argued that the factorist should restrict his 
analysis to the common factor variance. 

The term ‘ communality ' has been defined by various writers in different 
and sometimes conflicting ways. Through all their definitions, however, there 
runs the notion that communalities are reduced values to be inserted in the 
leading diagonal, and that, after extraction of k common factors(k < p, the number 
of tests), they become either zero or very small. The communalities are taken to 
represent the ‘ common variance ^, and the variance which is thus excluded from 
the diagonal entry to represent the ' uniqueness ’. 

This research was supported by the United States Air Force under Contract No. AF 41 


(657)—76, monitored by Air Force Personnel and Training Research Center, ATTN : Dr. John M. 
Leiman, Personnel Research Laboratory, Post Office Box 1557, Lackland Air Force Base, San 


Antonio, Texas. 


S,P, 


82 Charles Wrigley 


The disadvantages of introducing these reduced entries are well known. 


Adoption of a factorial structure with more factors than tests complicates the 


algebraic exposition and the difficulties of calculation. There is no generally 
accepted method for finding the communalities; and all take considerable time 
for a computer with only a desk machine. Yet most factorists regard these 
difficulties as a reasonable price to pay for reducing the lack of precision. 

In factor analysis there is always a twofold sampling. A sample of persons 
is selected from a population of persons and a sample of tests from a population 
of tests. To avoid confusion, the population of tests will be called the domain 
(Tryon’s convenient term [30], p. 4), and the sample of tests the selection. 
Alternatively, Guttman's term universe ([15], p. 282) might be used for the 
population of tests, and battery could be used for the sample. The ‘squared 


multiple correlation’ between each test and the remaining (p—1) tests will be 
referred to as s.m.c. 


II. INADEQUACIES OF THE COMMON FACTOR THEORY 


The theory of common factors has four main inadequacies : (i) the existence 
of alternative sets of communalities satisfying the theory; (ii) the relativity of the 
distinction between common and specific variance, which depends in part on the 
particular selection of tests; (iii) the failure to distinguish explicitly between 
common and significant factors; (iv) when significant factors are insisted upon, 
the possibility of changes in the probability level adopted for significance or of an 
increase in sample-size resulting in more factors and increased values for the 
communalities. In certain forms of analysis there is a further difficulty, which is 
not a necessary consequence of the theory, ‘but interlocks with it. Factors are 
usually rotated. If their number is increased by increasing the size of the 
sample, a restructuring of rotated factors is likely, with the larger factors 
splitting into smaller. 

(i) Alternative Sets of Communalities. Various sets of communalities, 
based on varying numbers of common factors, will satisfy the theory, i.e. pro- 
vide residual correlations of exactly zero [22]. The solution with the fewest 
common factors is generally assumed to be the best. To those who hold that 
factors are only operational fictions, convenient for classifying tests, this 
principle of parsimony presents no special difficulty. Since the s.m.c. of any 
test measures only common variance, a satisfactory case can be made for accepting 
the set of communalities with minimal deviations from the s.m.c.’s, so long as we 
are content with operational definitions. But to those who consider factors to 
represent real entities, the matter is not so easy, since it is hard to see any strong 
psychological reason why the solution with fewest factors is more likely to be the 
' true ' representation. This difficulty becomes still more acute when various 
sets of communalities are possible for the same minimal number of factors : 
see Thomson ([26], pp. 81-82) for an example. 

(ii) Relativity of the Distinction. The distinction between common and 
specific variance is relative, and depends on the Particular tests selected. By 


a samniat:. 


T or * 


The Distinction between Common and Specific Variance in Factor Theory 83 


adding suitable tests, specific variance can be converted to common variance. 
In Thurstone’s analysis of perceptual tests ([28], p- 96), the white-black flicker 
fusion test is almost completely specific, with a communality of only 0-12. But 
elsewhere in the same study (p. 23), a correlation of 0-41 of the white-black 
flicker fusion test is reported with the blue-yellow flicker fusion test (a test 
excluded from the factor analysis). Presumably a flicker fusion factor would 
have been isolated if tests for white-black, blue-yellow, and red-green flicker 
fusion had all been included; and the communality for the white-black test would 
then have been much higher. 

On the other hand, common factor variance can be reduced to specific 
by excluding all but one of the tests with appreciable loadings for the common 
factor. 'Thurstone’s perceptual study again provides an example. The 
reaction time factor (Factor C) has loadings of 0-73 for visual reaction time and 
0-68 for auditory reaction times, but no other loading higher than 0-36. If 
either test had been excluded, Factor C would probably not have been isolated; 
and the communality for the remaining tests would have been much smaller. 

"The communality also varies according as the sub-tests are scored separately 
or as a single composite score. In Thurstone's study of ' primary mental 
abilities" [27] tests of addition, subtraction, multiplication, and division were 
scored independently; but other investigators (e.g. the United States Army Air 
Force, [9], p. 80) have incorporated these into a single arithmetical test. On 
a priori grounds the composite might be expected to possess a larger common 
variance than any constituent sub-test, since it is longer and has greater reliability, 
But in practice it generally has a smaller communality. With one numerical 
score instead of four, variance previously regarded as common thus becomes 
classified as specific. 

According to the mode of selection, therefore, it seems possible for the 
communality to range between zero and the reliability. Factorial theory can be 
restated in various ways to allow for this. First, we might declare that no two 
tests be included in the same analysis if too similar. Two reaction tests might be 
included, for example, when they relate to two different sensory modalities; but - 
two flicker fusion tests would be excluded because both are visual. ‘This seems 
hardly acceptable. The difficulty of deciding what is too similar is great; and 
investigators will seldom agree. 'Thurstone seems to have changed his mind 
about the similarity of the two flicker fusion tests after his administration of the 
tests. Inany case, since one aim of factor analysis is to classify like tests together, 
it seems undesirable to prejudge the issue by classifying them before the analysis. 

Secondly, the value attained for a representative finite selection of tests 
might be regarded as an approximation to the value which would be obtained if 
the analysis had been made for the whole domain. Guttman [11], [14] has 
investigated the consequences of this approach. If the number of common 
factors stays constant, then, as the number of tests grows larger, the communality 
will remain unchanged, but the s.m.c. will increase. The two will coincide 
when the number becomes infinite. Further, the s.m.c. will converge to the 


F 


84 Charles Wrigley 


; á hat the 
ality even if the number of common factors increases, iar mui "i 
Seri y to tests decreases to zero as the number of tests bet d 
ratio of ow we difficulty is that it seems unlikely that eei i 
RA E: $ s vil stay constant as further tests are added, or sole a a maie 
Satie ae The “upper bound’ for the rank of the pene ira disces 
i a ([6], pp. 200) and Ledermann [22] from maton fees eae 
of feos often appears to give a reasonable estimate of a cg factors to 
factors required for an exact fit, Here, however, the pes iw. Andagsmyhis 
tests increases to one as the number of tests increases to in í i Guttman 
“upper bound’ may at times underestimate the p ai ([26], pp. 51-53) 
[17] provides one example, and the well-known paarina e itt 
another. Ina domain of tests, all variance hitherto regar E communality would 
expected to be converted to common, so that in the limit the 
become equal to the parallel-form reliability. : ting our objective as that 
Thirdly, we might make a virtue of necessity by stating to common variance 
of designing analyses so as to convert as much specific moe reliability so far as 
as we can, thus narrowing the gap between communality an trol of the testing 
Possible. To do so might be held to demonstrate our con 
situation and our Power to exp 


un 
3 : the gro 
ty of common and specific v. is soia ein be 
: : H 1t v S 
that our concern is with the * here and now ’; in that case munalities woul 
ommunality as such; different 2 rnatives, which 
: t two aite + cal 
rent batteries, ‘These las the empiric 
ariable measure, probably ous but they 
» and might prove acceptable operati "e for all into 
table test variance as divisible once a 
ns. 
to Distinguish 4 
tm of the theo tone never make 
ror arising from sampling of subjects. ‘Thurs ic sense, and a 
i à common factor, in the algebra Il error relates 
At times he implies that a 


‘gona 
iagon 
. uced d 
m is solved once the red cant factor. 


Il still be 


nt Factors. 
nreliability, 


to self-correlation 
entries are known, But a common fact 
Suppose Population values f 

-Zero correlati 
factor. This will re 
Nor is a Significant fa 
A principal axes anal 
between common an 


ctor necessarily a co 
ysis with u 


. and, 
à tests; 

d specific Variance: there are as many factors as rest, they 
Since later factors tend to have one or two loadings larger than the n s0 
hardly conform to Thurstone’s definition 


E be 
ror variance that such factors might expect 
be significant: i.e. we may 


fourthly, 
ress factorial results more generally. Or, fo , 


Sl rl 


The Distinction between Common and Specific Variance in Factor Theory 85 


similar near-specific factors on analysing another sample of the same size from the 
same population. Hotelling [18] and Bartlett [4] have devised significance tests 
for this situation. 

When the problem is formulated in a purely algebraic sense such as that of 
reducing the correlation matrix to minimal rank, later factors may measure no more 
his error will be included in the communalities and 
Our concern is therefore not with common 


factors per se, but only with those common factors that can be established as 
of aiming at a maximum number of 


significant. As Burt has argued, “ instead 
common factors, a safer policy is to aim at a minimum, retaining merely those that 
are fully significant—one general factor only, if that will suffice to account for the 


figures observed, adding others only when the evidence compels us ” ([8], p. 112). 

Thurstone seeks to restate the problem in statistical terms by writing : 
“ A matrix of minimal rank is to be obtained which deviates only slightly from 
the given side correlations ” ([28], p.283). But several difficulties arise from his 
formulation. First, it is doubtful whether the algebra of rank is intended as a 
guide to the number of factors, or whether his remark in the preceding paragraph 
that “the smallest number of independent factors that will account for the 
intercorrelations of z tests is the minimal rank of the correlation matrix ? js meant 
to relate to the practical question. Second, Thurstone in his book, nowhere 
indicates the size of discrepancy which is small enough to be ignored. 

(iv) Dependence on the Sample of Persons and the Probability Level. ‘The 
problem of determining the number of significant common factors has been 
considered by Lawley [21] and Rao [25]. Increase in sample size usually results 
in reduced standard errors and in the attribution of significance (as the accepted 
probability level) to variations which would otherwise have been taken to repre- 
sent merely sampling effects. It might be expected, therefore, that the number 
of factors would increase (a) if a more lenient level were adopted, or (b) if the 
sample of persons were increased. ‘This in turn would lead us to identify as 
common some of the variance hitherto accepted as unique. 

Is there any evidence that this situation fails to hold for common factors ? 
Possibly the Thurstonian hypothesis of a limited number of common factors 
might be restated to apply to a population of persons, but (owing to random 
errors) not to a sample. Assample size increased, exact reduction of rank would 
be expected with fewer factors, instead of with more, because factors which 
resulted only from sampling error would grow smaller, and would disappear if 
correlations could be obtained for the population. On this view, increasing the 
size of sample would be offset by smaller residuals, and the number of significant 
factors would remain unchanged. 

Two comments may be made. First, present evidence, though inconclusive, 
seems opposed to this view. Current neurological and genetic knowledge sug- 
gests greater complexity in the determination of the behaviour of the organism 
than can be accounted for by any faculty-type theory of a limited number of 
primary factors. With larger samples of persons and wider selections of tests 


than sampling errors; yet t 
regarded as common factor variance. 


86 E Charles Wrigley 


Secondly, this restatement requires that sample size be large enough to identify 
the full set of factors, but does not indicate how large a sample is needed. As we 
increase our sample, the number of common factors and the estimates of the 
communalities would change until the unknown number of common factors 
was reached. 

As long as no concrete evidence is available in its Support, we should perhaps 
view any hypothesis of a limited number of common factors with scepticism. It 
Seems safer to regard the number of significant common factors as depending on 
sample size, with its corollary that the communalities will vary from one sample 
to another. If so, the assumption of a fixed division of reliable variance into 
common and specific portions again becomes untenable. As pointed out 
earlier, this relativity could be accepted by restating the objective as that of trying 
to narrow the gap between communality and reliability; in that case a large 


common factors otherwise unrecognised, 


The most acute problem arises, however, because increase in the number of 
factors not only provides us with the additional information represented therein, 


when the number of factors increases), 
r loadings, with alterations in signs as 
This source of restruc- 


own; the problem could be overcom 
reliabilities, or unities, in the diagonal. 
Secondly, when factors are rotated, i 
factors to split into smaller: thi 
when the quartimax procedure [24] is applied to the 
Burt [8]. Unities in the dia, 
Guilford ([10], p. 286) has also noticed it. No doubt it is responsible for some of 
the disagreement in the literature dealing with certain factors. The problem 


servations may here be made. 

nfined to communalities. When the 
n a test of significance, it becomes 
tter what the diagonal entries may be. 
© be less for communalities than for 


in part a function of size of sample, no ma 
Indeed, changes in structure are likely t 


T 


The Distinction between Common and Specific Variance in Factor Theory 87 


unities or reliabilities (though greater than with s.m.c.’s), because reducing 


the diagonal values reduces the smaller factors proportionately. 

2. Before including factors in a rotation, we might require them to be not 
only significant but also large enough to be psychologically important. Psycho- 
logists should certainly not rotate more factors than are significant; but it might 
be advantageous to operate with fewer. The map-maker does not try to put in 
all the known detail; and similarly the factorist might find loss of detailed in- 
formation counterbalanced by increased clarity. Reduction might be achieved in 


one of two ways: 
(a) We may introduce a rule for excluding factors which, though significant, 


are not of psychological importance. Kaiser [19] has proposed the exclusion 
of all factors for which the sums of the squares of the loadings are less than 1-00. 
It seems a doubtful gain to accept in the system a factor which contributes 
less information than any single test which it replaces. Tryon ([31], p. 17) 
suggests that the practical importance of a factor might be better judged by its 
highest loading than by its overall contribution; in that case we might reject 


every factor whose loadings were all under (say) 0-15. 
(b) We may adopt that particular set of rotational solutions which achieves 


greatest clarity. This would require a definition of clarity and a way of measuring 
it: eg. it might be argued that the clearest structure Was achieved when the 
variance of the squared loadings in the rotated solution was greatest, the quartimax 
principle being used to decide the number of factors as well as the preferred 
rotation. Increasing the number of significant factors would then make no 
difference beyond the point where greatest clarity was attained. 

3. The problem is greatly reduced if a hierarchical structure is adopted. 
Possibly because its basis lies in faculty psychology, simple structure theory has 
tended to regard each factor as coordinate in status with every other. Burt [7], 
followed by Vernon [32], considers a hierarchical organization to be far more 
effective than a coordinate one for classifying tests. This can be illustrated from 
the analysis of ability. If only one factor is extracted, a general factor of intel- 
ligence will probably be obtained. If two factors are extracted, verbal tests will 
probably be separated from spatial and mechanical. If further factors are 
extracted, the verbal and the mechanical factors will each subdivide. Thurstone’s 
recent advocacy of second-order factors ([20], ch. 18) seems to represent a similar 
step towards hierarchical factors; SO does Guilford’s schematization of factors 
for thinking and memory [10]. 

A hierarchical system is akin to a genealogical chart: large factors are 
progressively subdivided into smaller. When the sample is small, few factors are 
significant, and only the higher levels can be established; with a larger sample, 
the chart can be filled in with greater detail. The rotation of greatest clarity is 


then represented by a horizontal segment across the chart. 
III. LACK OF AGREEMENT IN THE DEFINITION OF COMMUNALITY 
One reason for all these difficulties is the lack of an accepted definition of 


communality. Most factorists agree about excluding specific and error v 


88 Charles Wrigley 


and invoking as few common factors as possible; but they differ as to how these 
aims are to be realized. Some, like Spearman, Thurstone, and Guttman, 
start with a hypothesis about the formal structure of the tests, and define the 
communalities as the values ensuring the best fit to the postulated model. 
Others, e.g. Tryon, and Guttman in some of his contributions, define the 
communality in terms of relations between a particular test and a domain. 
Others again, like Lawley and Rao, define the communality in statistical instead 
of algebraic terms?. 

Each of these alternatives deserves a brief examination. 
agreement on the definition of communality, 
ment about the values to be accepted. 

The Spearman Model. Spearman assumed that 
variances had been removed, a si 
lations. He himself scrupulous 
diagonal of his correlation tables. 
usual way, then the adoption of his 
munalities those values which most 
demonstrate the validity of the the 


Until there is some 
we can scarcely hope for any agree- 


» Once specific and error 


But, if his tables are to be factorized in the 
theory would entail choosing for the com- 
nearly reduce the matrix to rank one. To 


Ory, it is necessary to test it with a wide 
variety of correlation matrices. This test, however, clearly revealed the need 
for rejecting the Spearman hypothesis: the majority of the correlation matrices 
actually obtained could not be reduced to rank one. Evidently the ‘ two factor 
theory ’ was far too simple to accord with the facts. 


agonal values will each 
"n Burt. In his earliest 
papers (Brit. J. Psychol., III, 1909, pp. 94f. etc, i 


"ij = Yig fjg : so that ri; = rjj. The 
each column, In later papers he 
ulate the saturations, Where the 
: (as in fact is most 
commonly the case), “a slight bu “is necessary in applying 
formula iii ” (the summation equation for , as the context indicates, 
tend [5 p. Ei eee efficient to a value agreeing more closely with the general 


homson, REO sace the Corresponding passages in Factors of the Mind (pp. 448, 462) 
Bowe d ot quite clear how he (Burt) filled in the blank diagonal cells ” (26, p. 25). Burt, 


ns that the use of a fixed €'s suggestion of using the highest 
Since his own tables were always 


f ' smoothing’ seemed the quickest 
necessary) by s i 


adequacy of a single factor i 
the values for the *selfcomeim i ’ altogeth, Thi me a TUE so Ay dibit 
[ d is . A "pa 2 d - 

nm Since i ora that the method of i proved EN a iscussions ix ze 
nfortunately when m Tguing in a vicious ci " 
rice was available. Sheppard, 
alent to postulating a matrix 
t, the appropriate value would 


is was adopted as a convenient 
ommunication), 


only, it w. 


on coefficients ” ; and th 
and personal c 


hs 


xd 


| 


The Distinction between Common and Specific Variance in Factor Theory 89 


The Thurstone Model. At present Thurstone’s hypothesis is probably the 
most widely accepted. Reverting to faculty psychology, he postulated a limited 
number of ‘ primary mental abilities’. As we have seen, his theory was stated 
algebraically as that of attaining minimal rank for the correlation matrix. How- 
ever, the definition of communality in terms of rank has proved difficult arith- 
metically. For one thing, no easy method has been developed for finding minimal 
rank: Albert’s procedure [1], [2] is too complex for regular use. Moreover 
both the number of factors and the values of the communalities have to be 
determined; and this double determination is all the harder because the two 
estimations are interdependent. Furthermore, if minimal rank be required in 
the literal sense of a solution that reduces all residuals to zero, then it would seem 
that the number of common factors can rarely be reduced below Lederman’s 
‘upper bound ’, and sometimes not even as low as that. Hence in practice far 
more factors have to be accepted than 'Thurstone apparently anticipated when 
formulating his theory. For example, when s.m.c.’s are inserted in the leading 
diagonal, the well-known Holzinger and Harman correlation matrix, based on 24 
ability tests, has 13 positive latent roots: this means that at least 13 factors are 
needed to account for the correlations ([15], p. 153). 

What is even more important, the actual factorial findings seem to be making 
'hurstone's hypothesis as unacceptable as Spearman's. Thurstone never 
stated explicitly how many factors in all might be expected. His adoption of the 
phrase ‘ primary mental abilities’ in his analysis of cognitive tests [27] suggests 
that he expected a rather limited number; and his repeated insistence on the 
importance of parsimony supports this interpretation. Yet the number of factors 
empirically identified continues to increase. Guilford [10] lists no fewer than 40 
in his review of work on thinking and memory alone, and expects to discover 
more when appropriate tests are constructed. 

The Guttman Models. Guttman ([13], ch. 6) notes the inability of 
"l'hurstone's theory to fit the data, and has proposed new formal models such as 
the simplex, the circumplex, and the radex. These employ different relations 
among factors loadings from those ordinarily adopted (viz. accounting for corre- 
lations by the products of loadings). He accordingly defines communalities as 
the values most nearly maintaining the postulated structure. Up to the present, 
however, no allowance seems to have been made for errors resulting from 
sampling. As with Spearman and Thurstone, therefore, we can only accept 
Guttman's definition if his postulated structures are shown to be applicable to a 


wide variety of empirical matrices. 


»'Tl'hurstone's account [29, pp- 282-283] is not altogether clear. Thus in a footnote on p. 283, 
he expresses his objective as finding the values which maintain the rank of the off-diagonal elements 
in the correlation matrix. This may have been to avoid certain criticisms from Lederman [22], 
who had demonstrated that anomalous correlation matrices might occur in which minimal rank 
was not associated with a minimal sum of communalities. It is difficult to give precise meaning 
to Thurstone’s remarks about the rank of the o 


ff-diagonal entries unless we assume a minimal 
rank requirement. For similar cirticisms of this passage the reader may refer to this Journal, 
IV, p. 144, and VII, p. 108. 


90 $ Charles Wrigley 


The Tryon Model. Tryon [20] formulates the problem quite differently. 
He defines the communality in terms of his domain-sampling theory, as the 
squared correlation between the test and a domain of tests whose correlations 
with the remaining p— 1 tests are proportional. With this definition, the com- 
munality of a test can be shown to equal the correlation between it and a hypo- 

- thetical test having identical correlations with the remaining p—1 tests in the 
selection. Correlations are then estimated between the tests 
sized counterparts by psychometric formulae of the Spea 
He thereby obtains a set of quadratic equations which whe 
values for the communalities ([30], p. 246). 

The procedure has been restated by Kaiser [20] in determinantal form; and 
the resultant iterative technique is found to converge for artificially constructed 
matrices. It is not yet known whether with empirical matrices convergence is 
slow or fails to occur. Mathematical Properties of the method have also been 
examined by Guttman [15]. The conception of a test as drawn from a hypo- 
thetical domain of similar tests is attractive, since psychometric theory has 
usually paid more attention to the sampling of persons than to the selecting of 
tests. The power of the method is shown by the fact that it leads to explicit 


equations, and that their solution requires no knowledge or estimate of the number 
of factors. 


On the other hand, to postulate a domain of test 
is only a necessary, not a sufficient, condition. 


and their hypothe- 
rman-Brown type. 
n solved give exact 


s with proportional profiles 
There are further requirements 


predesignated domain would provide the requisite 
parallel tests; but the domains for t 


defined after the tests have been selected for analysis. 

As we have seen, alities are possible; and it is 
lead to the minimal rank set, 
ificially constructed matrices. 
hich the linear dependence of 
mes. Because of this assump- 
s of the Tryon—Kaiser method 


Tryon has developed several diffe: 
- Kaiser derives his d 


a hypothetical test 
‘Thurstone and the 


Ge. 


The Distinction between Common and Specific Variance in Factor Theory 91 


An algebraic solution is designated in which all residuals are reduced exactly to 
zero. Hence a large number of factors will generally be needed in any small 
sample—more perhaps than might reasonably be expected to be meaningful. 
Secondly, Tryon explicitly admits that communalities will change as extra tests 
areadded. ‘Proportional tests’ are merely fictional devices, introduced in order 
to achieve a more compelling conceptualization; it is not asserted that such tests 
could be developed empirically. But definition of the communality in such terms 
means that the profile has to be extended to cover further tests added to the 
selection. 

For Tryon, therefore, the communality is always restricted to the context 
of the tests in a particular battery. Within that context he considers that, except 
for sampling variations, the communality will have only one value. Since he 
acknowledges that diagonal entries should be based on the ‘here and now’, 
i.e. on the particular selection of tests chosen by the investigator, one wonders 
whether a case might not be made for using the actual s.m.c. which measures 
variance held in common with the remaining p-—1 tests, instead of a fictional 


s.m.c. This will be discussed later. 
The Statistical Model. ‘The final position to be considered is that in which 


the problem is stated in statistical rather than algebraic terms. This is repre- 
sented by Burt’s procedure of correcting for uniqueness ([8], p. 113), Lawley's 
d method [21], and Rao's canonical method [25]. The 
communalities are then the values which enable the correlations (weighted in 
accordance with the communalities) to be fitted as well as possible for the 
significant factors, i.e. they are those which give the smallest sum of squares of 


weighted residuals. 
This includes some 'Thurstonian features, but rejects others. The number 
of factors depends on significance, not on rank; but rank reappears in the need 
to find diagonal values providing the best fit between correlations and factor, Le. 
which achieve rank reduction, not to the minimum, but to that implied by the 
indicated number of significant factors; and these diagonal values are used in 
applying significance tests. Many disadvantages of Thurstone's algebraic 
approach are overcome by such methods. The requirement that common factors 
should also be significant factors is obviously in line with modern statistical 
theory. Furthermore, the method is free from some of the theoretical difficulties 
of Thurstone’s position. The distinction between common and unique variance 
is no longer an attribute of a test per se, à position hardly tenable in view of 
variability of the communality : instead it is made relative to the particular set of 
tests and persons, and its justification resides simply in the fact that the loadings 
provide the best fit to the correlations. Weare no longer committed either to the 
faculty theory of a limited number of factors or to the postulate that their 
number either stays constant when tests increase, or increases at a slower rate. 
Two difficulties remain. The first is logical. While dependence of the 
communality upon the test selection may be psychologically acceptable, it is hard 
to give psychological meaning to the variations in the communality that arise 


maximum likelihoo 


92 Charles Wrigley 


from changes in the sample of persons. The approach seems to imply that in a 
very large sample practically all common factors will be significant, so that the 


biased statistic. Samples nearly always 
alities are estimated, and factors are teste 
The two problems might be separated b 
braically, e.g. by taking the minimal ran 


Population of persons. There w 
sample size; but there would b 


IV. Communauty, RELIABILITY, AND SQUARED MULTIPLE CORRELATION 
The central argument of this paper has been tha 
variance has not been shown to be stable. 
sequences if communalities 
persons. 


t the distinction of common 
This section will consider the con- 
should prove relative to the selection of tests and 


selected. He does not 


itly; but, as Guttman has shown, it 
follows from his hypothesis. 


Since no satisfactory evidence has so far been 
ctors stays constant, insistence on an 


» the further factors 

But such a restatement seems 
to an appreciable change ina 
Secondly, since the theory 


1 ure of the common variance 
small factors should not be excluded in estimating it : 


The Distinction between Common and Specific Variance in Factor Theory 93 


otherwise calculated values would be merely an uneasy compromise between the 
value for a domain and that for the particular selection. 

Tryon, on the other hand, postulates a different domain for each test. 
Thus in a different way he requires, like Thurstone, that no new common 
factors be introduced in determining the common variance in the domain. The 
appearance of new common factors he meets by redefining the domains. Once 
this is conceded, it would seem to follow that, by adding a sufficiently large and 
representative set of tests, all specific variance will be converted to common. 
In the limit, therefore, the communality would equal the reliability : by 
reliability we here mean the correlation between a test and a parallel form, since 
a parallel form could be added as a further test. A thoroughgoing adoption 
of domain-sampling therefore seems logically to require abandoning the common- 
specific distinction altogether. In the domain, test variance will be divisible 
into two parts only, reliability and error, so that reliabilities should be inserted 


in the leading diagonal. 
The Communality as the Value Giving the Best Fit to the Observations. An 


alternative case can be made for communalities on the ground that they provide 
the smallest residuals. ‘Thurstone apparently hoped that communalities would 
ive the best correlational fit, as well as measure common variance in a domain. 
This would be providential. But the coalescence could only be achieved by the 
unproven assumption of a limited number of common factors in the domain. 
The correlations merely offer a summary description of the relations between 

the test measurements. Hence if the aim is to fit the observed data as closely 
as possible, it would seem better to seek to reproduce the test measurements : 
i.e. if there is to be a reduction from p tests to k factors, the factors should be 
those for which the factor measurements are most highly correlated with the 
test measurements. This is the approach of the maximum likelihood method. 
Factors so secured are those which provide the maximal canonical correlations 
dingly prefers to call the procedure the canonical 


with the tests ; and Rao accor 
method of factor analysis ([25], P- 94). The disadvantages of this formulation, 
hen using the maximum likelihood 


if we may judge from certain results w 

method, seem to be the possibility of obtaining diagonal values which do not 
conform to customary expectations : e.g. when the sample is so small that only 
one factor is significant, a canonical correlation between tests and factor of 
1-00 can be secured by reproducing one test as the factor. More generally, 
if the problem is to maximize the relations between known tests and unknown 
factors which are functions of the tests themselves, something resembling 
spurious correlation may conceivably arise. This seems to hold for all com- 
munality formulations. Since the factor loading is the correlation between test 
and factor, the communality is equal to the correlation of any test with the full 
set of common factors. The test is therefore to be correlated with factors 
of which it forms part. Such a theory seems to leave us free to arrange the 
conditions of the right-hand side of the test-factor relation so that we can 


“ pull ourselves up by our own bootstrap x 


94 Charles Wrigley 


The Squared Multiple Correlation as a Measure of the Common Variance m 

` the Test Selection. The possibility of using s.m.c’s deserves greater considera- 
tion than it has so far received. Both in theory and in practice the communality 
has proved a somewhat intractable concept. If s.m.c's. were inserted in the 


leading diagonal, the distinction between common and specific variance would 
be based on the test selection inste: 


known that the s.m.c. 


“common variance’ introduced by 
is therefore a measure of the extent to 
y the factors. Can this fairly be called 
nce initially was considered to be variance 
but the communality is now seen to include a 


common variance ? Common varia 
common to two or more tests, 

certain amount of variance that i 
of itself. It is scarcely psycholo 
“common variance ’, The dile 
that the increment represents 
the selection to the limit ; but this li 


¢ r ning the number of significant factors. One 
disadvantage is the lack of a significance test ; but this is a problem for the 


e test per se, transferable to 
tween that test and the other 
i Pancies between s.m.c.'s and 
ight Suggest defects in test selection. Computa- 


need investigation (a) the size of the ir nts 
: n € increme 
from s.m.c's to communalities ; (P) the extent to which factorial results are 
changed by using s.m.c’s for communalities, 


In default of more direct evidence, 
atives and their implications. At 
Dcrements remain constant from 
ets of principal axes loadings would then be 
ce being that the latent roots are greater with 
er by the value of the Constant ; so that with 


Proportional, the only differen 
the latter than with the form, 


The Distinction between Common and Specific Variance in Factor Theory 95 


communalities the minor factors play a larger part in rotations. In this case 
therefore the use of s.m.c's would make little difference in practice ; but the 
s.m.c's would be simpler computationally. A 

_ The key issue is then represented by the other extreme, namely, where 
increments differ greatly from one variable to another. Here communalities 
have definite algebraic advantages, provided that they can be accurately deter- 
mined : (a) the correlation matrix is then Gramian, i.e. there will be no negative 
latent roots with associated imaginary loadings ; (6) the diagonal values with 
which the analysis starts are reproduced by the factor loadings. But even here 
the psychological advantages seem to rest with the s.m.c’s. Consider the 
artificial correlations in the table below. In the first case the communalities 
are approximately proportional to the s.m.c’s ; in the second the factor loadings 
have been chosen so that there will be a marked discrepancy between the s.m.c. 
and the communality for the fifth variable. For this the correlations in the 
second case are only 0-045 larger than in the first ; yet we are asked to believe that 


To ILLUSTRATE THE EFFECTS OF MINOR CHANGES IN FACTORIAL 
STRUCTURE ON THE COMMUNALITIES AND THE SQUARED MULTIPLE CORRELATIONS 


Squared 

Factor Commun- multiple 
loadings 5 alities correla- 
Correlations tions 

0-90 — — 0-72 0-63 0:54 0-09 0-810 0-622 
0-80 — 0-72 — 0-56 0:48 0-08 0-640 0:546 
0-70 — 0:63 0:56 — 0-42 0-07 0-490 0:426 
0:60 — 0:54 0:48 0-42 — 0-06 0-360 0:315 
0:10 — 0:09 0:08 0:07 0-06 — 0-010 0-009 
0-90 0-05 — 0:7225 0-6325 0:5425 0:1350 0:812 0-626 
0-80 0-05 0:7225 — 0:5625 0:4825 0:1250 0-642 0:549 
0-70 0-05 0-6325  0-5625 — 0:4225 0:1150 0-492 0-429 
0-60 0-05 0:5425 0:4825 0:4225 — 0-1050 0-362 0-318 
0-10 0:90 0-1350 0-1250 0-1150 0:1050 == 0:820 0:022 


in the first case the common factor variance is only 1 per cent of the test’s total 
variance, whereas in the second case it amounts to 82 per cent. If someone 
unfamiliar with factorial theory were confronted with these two cases, he would 
surely regard the s.m.c. as offering the better measure of the * commonness ' 
of the fifth variable. No assertions that the communality is the value attained 
by the s.m.c. in a domain of tests would affect his verdict. Rather he would 


believe that Thurstone had allowed the algebra of the situations to dominate 


over the psychology. 
S.m.c.’s, of course, 

if significant factors alone are 

of persons increases ; and tha 


do not answer all the issues raised in this paper. Thus, 
rotated, their number may increase as the sample 
t may lead to a radical restructuring. The fact 


96 Charles Wrigley 


that the factorial results may very possibly change with increase in the sample 
of persons, or in the selection of tests, should warn us that any hope that factors 
can be regarded as real causal entities is unlikely to be realized. Factors must 
be regarded as operational fictions. Our task is to fix the rules governing their 
extraction so that our practical problems, e.g. the classification of the tests, can 
be most skilfully and most expeditiously handled. And it certainly seems 
possible that s.m.c’s may serve these purposes better than communalities. 


V. Summary 


eving that the actual values found 
ernative sets of communalities in 
tion to decide between them ; and 

on human convenience than on 
Secondly, standard theory seems to have 


ariance, thereby increasing 


the communality, educe common variance to 


specific. 
Once the distinction betwee 


n common and specific varianc: 
as operational, i.e. as relative to th 


Specific variance can in 
to common by adding enough tests, The principal claim 
they givea better fit than any other diagonal 
ations actually observed. The psychological advantages, 
uared multiple correlati 


y hx 
——M—————— 9 QA 
T. ——————— 
—-————— 5 


The Distinction between Common and Specific Variance in Factor Theory 97 


* REFERENCES 
ALBERT, A. A. (1944). The matrices of factor analysis. Proc. Nat. Acad. Sci., 
XXX, 90-95. 
ALBERT, A. A. (1944). The minimum rank of a correlation matrix. Proc. Nat. 


Acad. Sci., XXX, 144-146. 
Banks, C., and BURT, C. (1954). The reduced correlation matrix. Brit. J. Stat. 


Psychol., VII, 107-118. 
Bartiett, M. S. (1950). Tests of significance in factor analysis. Brit. 7. Psychol. 


Stat. Sect, III, 77-85, IV, 1-2. 
Burr, C. (1917). Distributional and Relations of Education Ability, London : P. S. 


King. 
Bunr, C. (1952). Bipolar factors as a cause of cyclic overlap. Brit. J. Psychol. 


Stat. Sect., V, 197-202. 
Burt, C. (1949). The structure of the mind ; a review of the results of factor analysis. 


Brit. J. Educ. Psychol., XIX, 100-111, 176-199. 

Burt, C. (1949). ‘Alternative methods of factor analysis and their relations to Pear- 

son’s method of ‘ Principal Axes °. Brit. J. Psychol. Stat. Sect., I, 98-121. 

Guitrorp, J. P. ed. (1947). Printed Classification Tests, Washington, DG 
U.S. Govt. Printing Office. Army Air Forces Aviation Psychology Program 
Research Report, No. 5. 

Guirronp, J. P. (1956). The structure of intellect. 
GUTTMAN, L. (1940). Multiple rectilinear prediction and 
ponents. Psychometrika, V, 75-99. 
GUTTMAN, L. (1953). Image theory fo: 

metrika, XVIII, 277-296. 
Gurrman, L. (1954). A new approach to factor analysis : the radex. In Paul F. 
Lazarsfeld, Mathematical Thinking in the Social Sciences. New York : Columbia 


University Press. 
GUTTMAN, L. (1954). Some necessary 
Psychometrika, XIX, 149-161. 
Gurrman, L. (1956). Best pos: 
metrika, XXI, 273-285. 
Gurrman, L. (1956). Improve 
GurrMAN, L. (1956). To what extent cà 
Report. 
HorzuL ING, H. (1933). Analysis of a € 
components. J. Educ. Psychol., XXIV, 
Kaiser, H. F. (1956). The varimax met 
file in the University of California Library. 
F. (1956). Solution for the communalities : 


Psychol. Bull., LIII, 267-293. 
the resolution into com- 


r the structure of quantitative variates. Psycho- 


conditions for common-factor analysis. 
sible systematic estimates of communalities. Psycho- 


d bounds for communalities. Research Report. 
n communalities reduce rank? Research 


omplex of statistical variables into principal 
417-441, 498-520. 
hod of factor analysis. Ph.D. Thesis, on 


Kaiser, H. a preliminary report. Re- 
search Report. i . 

Lawtey, D. N. (1940). The estimation of factor loadings 
likelihood. Proe. Roy. Soc. Edin., LX, 64-82. 

wW. (1937). On the rank of the reduce 


LEDERMANN, V i 
factor analysis. Psychometrika, II, 85-93. — A . 1 
ng matrices with variable diagonal 


LzpERMANN, W. (1940). A problem concerni 


ts. Proc. Roy. Soc. Edin., LX, 1-17. f 
sean. I. 0. and A GE (1954). The quartimax method : an analytic 


proach to orthogonal simple structure. Brit. J. Stat. Psychol., VII, 81-91. 
Rao, C. R. (1955). Estimation and tests of significance in factor analysis. Psycho- 


ika, XX, 93-111. E 
COREL G. H. (1951). The Factorial Analysis of Human Ability. London: 


University of London Press. 


by the method of maximum 


d correlation matrix in multiple 


G 


S.P. 


98 Charles Wrigley 


[27] Txurstong, L. L. (1938). Primary Mental Abilities. Chicago : University of Chicago 
Press. 


[28] THURSTONE, L. L. (1944). 
Chicago Press. 

[29] Tuurstong, L. L. (1947 
Press. 

[30] Tryon, R. C. (1956). 
Research Report. 

[31] Tryon, R. C. (1956). 
Report. 

[32] Vernon, P. E. (1950). 


A Factorial Study of Perception. Chicago: University of 
). Multiple Factor Analysis. Chicago : University of Chicago 
Communality of a variable : formulation by cluster analysis. 
Cumulative communality cluster (CCC) analysis. Research 


The Structure of Human Abilities. London : Methuen. 


OM 


“ SS 


Vol. X The British Journal of Statistical Psychology November 
Part II 1957 


'THE RELATIVE INFLUENCE OF HEREDITY AND 
ENVIRONMENT ON ASSESSMENTS OF INTELLIGENCE 


By Cyrit Burt AND MARGARET HOWARD 


The Application of the Parallelogram Law. In several previous publications 
we have briefly referred to a graphical method of representing and computing 
the effects of genetic and non-genetic factors which is based on the same principle 
as the familiar diagram used for illustrating the composition and resolution of 
forces in physics—the diagram popularly known as the ‘ parallelogram of 
forces’. ‘The method we propose treats the variables concerned as measurable 
tendencies or ‘ vectors’ obeying the ordinary rule for the addition of vector 
quantities. It has proved specially helpful in attempts to analyse and interpret 
the factors involved in measurements of intelligence. Results obtained in this 
way were cited in a previous paper on multifactorial inheritance in this Journal 
(IX, pp. 126-7). Several correspondents have since asked for a more detailed 
description of the procedure. We therefore offer a short account of the working 
method, illustrated by data given in the article just quoted. 

In explaining the principles on which the calculations are based, we should 
like to emphasize at the outset that we regard the values so reached as representing 
only approximate estimates for the hypothetical variables concerned. The 
approximation is, we believe, sufficiently exact for use with measurements of the 
kind we have in view, since their error or unreliability is in any case likely to be 
rather large ; but we do not claim that the method will furnish the most efficient 
estimates, such as might satisfy the ideals of the mathematical statistician. 
Indeed, Mr. Alan Stuart, who has been good enough to read and comment on 
an early draft of this note, states that in his opinion the type of problem here 
raised is one which still awaits an adequate theoretical investigation. 

Hypothetical Analysis. In applying the procedure to the problem mentioned 
we may start by dividing the conditions which affect our assessments of intelli- 
gence into the following hypothetical factors. 

I. Genetic or Heritable Factors (H). These include (i) the Fixable contribu- 
tions resulting from ‘ the essential genotypes ' ; (ii) the Non-fixable contributions 
due chiefly to the effects of partial dominance ; and in addition to these it is 
usually desirable to allow for (iii) the effects of Assortative Mating. (Formal 
definitions of these terms are given in the article already quoted, p. 106.) For 
our present purpose it will not be necessary to deal with these subclasses 


separately. 


II. Non-genetic or Non-heritable Factors (N). These include (i) Random 


Environmental Effects, i.e. the effects of environment so far as it operates in 
complete statistical independence of the genetic tendencies (R), (ii) Systematic 
Environmental Effects, i.e. the effects of environment so far as it operates in the 
same direction as the genetic tendencies (S), (iii) a contribution due to the 
G2 


100 Cyril Burt and Margaret Howard 


Interaction of heredity and environment (if any), i.e. to the joint effects of hori 
when they are correlated (J), (iv) all other unidentified influences—errors o 


measurement, sampling errors, etc., which may be summed up under the general 
term Unreliability (U). 


Data. The data availab 


om somewhat smaller groups, 
of the nearer relatives, and of twins, 
Itural conditions of a representative 


for the apparent intelligence of the parents, 
as well as ratings for the economic and cu 


From these various measurements or assessments we have computed the 
following coefficients : 


(i) Correlations between 
US to assess the amount of ‘ 


(ii) Correlations betwee 
together and (b) 


t cousins: these by exclusion 
of pre-natal environment ; m 
ll as to be virtually negligible. 
fispring, (b) siblings, (c) fathers 
tive influence of the first three 
as compared with the influence 


‘heritable’ condit 
of environment, 

Preliminary Analysis of Variance, Using these correlations, ‘and relying 
mainly on the formulae derived in our previous Paper, we can partition 
the total variance for each set of assessments as follows (Table I : for the detailed 
working see Joc, cit., IX, pp. 124f). 


ions Separately (if required) 


TABLE I, ANALYSIS or VARIANCE 
Crude Adjusted 
2 Test Marks Assessments 
Heredity 77-06 87:56 
Environment : random effects 5-91 5:77 
Environment : Systematic effects 10-60 1-43 
Reduced Total 9357 5 
Unreliability 6:43 "m 
"Total 1 100-00 100-00 
Analysis by Vectors. We now —— 


eA W propose to regard the main components as 
vector quantities (i.e. quantities defined solely by 


their magnitude and direction) 


vp n 


E T. 
vi 


uj 


Heredity and Environment 101 


and to treat the observed assessments as a resultant obtained by combining the 
components in accordance with the ‘ parallelogram law’ for the addition of 
vectors, on the analogy of the resultant and component forces in mechanics. 

: In the problem we are examining here we are not concerned with the 
effects of unreliability. These form a random variable uncorrelated with 
the remaining tendencies, and would therefore have to be represented in a third 
dimension. Here we shall be concerned solely with an attempt to partition 
what we may call the ‘reduced total’, ie. the total left after the effects of 
unreliability have been deducted. Accordingly, on the basis of the first three 
percentages shown in Table I we may now re-analyse the data in the following 
way. 
So far as direct causation is concerned the variations in a child’s genetic 
constitution are completely independent of the variations in his environment. 
But, as a matter of empirical fact, it is always conceivable that in any selected 
group the variations of the two factors may be in some degree correlated, possibly 
owing to some indirect process of causation. Let us therefore begin by taking 
as our two rectangular axes, first the horizontal line OH, to represent, by its 
direction ‘and length, the relative influence of heredity (H), and a second line 
perpendicular to it, namely, the vertical axis, OR, to represent the influence of 
environment (R) in so far as it is uncorrelated with heredity (see Fig. 1). Their 
lengths will accordingly be drawn proportional to the square roots of the corre- 
sponding variance, i.e. to the standard deviations. 

We now require a supplementary component to represent environment, 
in so far as it operates in the same direction as heredity. This can be represented 
by the line RS parallel to OH. The joint effects of these two environmental 
influences will then be indicated by a line, OS. Evidently the observed test- 
marks, OT, will be found by completing the parallelogram, OSTH. 

Problem. Asanalysed in Table I, the results obtained yield figures indicating 
(i) the length of the base, OH, (ii) the length of the diagonal, OT, and (iii) the 
height of the parallelogram, OR. The point T can therefore be determined by 
describing a circle with centre at O and radius OT cutting RT in T. Hence our 
main task will be to determine the lengths of the lines RS and OS (which are 
needed to measure the influence of the components S and J and can be cal- 
culated by Euclid I, 47) and the angles between the various vectors (which 
are needed to measure the various correlations). ^ p 

Working Procedure. The suggested steps in the calculations are simple 
if a little lengthy. To begin with, we find the lengths of the lines OH, OR, and 
OT by taking the square roots of the variances : e.g. for the crude test measure- 
ments (col. 1 of Table I), OH = (77:06) = 8:7784; OR = y(5:91) = 2-4305; 
OT = y(93:57) = 9:6732. f 

Since cos ROT = OR/OH = 0-2513, we can at once determine the angle 
ROT (75° 26^) and its sine (0-9679); and, since RT = OT sine ROT, we can now 
calculate RS = RT-ST = RT—OH = 9:3627 —8:7784 = 0:5843. The 
remaining steps are obvious. We compute in turn: tan ROS = 0:2403, angle 


102 Cyril Burt and Margaret Howard 


OS = 13° 29’, cos ROS (res) = 0-9724, SO l 
SOT (rst) x 0:4703, angle SOH = 76° 31’, cos SOH (rsh) = bie und sh 
TOH = 14° 34, cos TOH (rtn) = 0-9679. The square of OS gives the varian $ 
of the joint environmental effects, viz. 2-5002? = 6:2514; and the last three cosine 


; : : ; sé 
specify the three correlations which we chiefly desire to know—namely, tho 
between (i) ic eff 


environment and the tests, ape 
(ii) the systematic effects of the environment and heredity, 0-2332, an 
(iii) heredity and the tests, 0-9679. 


= 2:5002, angle SOT = 61°57’, 


R «8 


T 
E ——— U 


eS 


O 


H 


and their Resultant (OT). 


Fig, 1. Heredity (OH), Environment (ST), 
Alternative Analysis of Variance. W 
reduced total, ot, in terms of the ch 
Writing o? for the Variance contributed 
joint environmental effects, and Thj 
` by the familiar formula, 


€ càn now re-analyse the variance pi 
ief contributory factors just specifie 5 
by heredity, oj for the variance of the 
for the correlation between the two, we have, 
ol = oito} + 2ryj Th oj; 

that is, for the Crude Test Marks, 

= (8:7784)? + (2:5002)? +2 x 0:2332 x 8-7784 x 2-5002 

= 77-06 4- 6-25 + 10-26 — 93.57. 
In the same Way we obtai 


n for the 
(9:3574)? + ( 


Adjusted Assessments, 
2:4031) 4.2 x 0-031 


7x9-3574 x 2-4021 


Crude Adjusted 
, Test Marks Assessments 
Heredity 77-06 87:56 
Environment 6:25 5-78 
Effects of correlation between heredity 
and environment i 10-26 1:42 
Total 93-57 


94-76 


Heredity and Environmeni 103 


This procedure enables us to subdivide the total variance in such a way as to 
show not only the proportion contributed (i) by genetic influences and (ii) by 
environmental influences (both ‘ systematic ’ influences, i.e. those operating in 
the same direction as heredity, and ‘ random ’, i.e. those operating in statistical 
independence), but also (iii) the contribution due to the joint effects arising from 
the fact that the two main components may be more or less correlated. We 
observe that, for the crude test measurements, this third contribution is fairly 
large—over 10 per cent, whereas for the adjusted assessments it is almost 
negligible, amounting to between 1 and 2 per cent only. 

Empirical Verification. From the initial figures for the component variances, 
it will be noted, we are able by this method to deduce an estimate for the corre- 
lation between environmental influences, on the one hand, and the assessments 
for intelligence, on the other. Taking the crude test marks, we obtain a corre- 
lation of 0-470; and on substituting the adjusted assessments we obtain 0-277. 
It should be possible to check these two theoretical estimates by comparing them 
with the empirical coefficients. All that we require are objective assessments for 
the relevant environmental conditions. It was impossible to procure the 
necessary figures in the case of every individual child in the entire group. But, 
as noted in our previous paper, we were able to secure trustworthy ratings for 
the ' socio-economic status ’ of a large number of households, which together 
form a fairly representative sample. These include ratings for (a) the material 
(i.e. financial and hygienic) conditions of each family, and (b) the cultural (i.e. 
educational and motivational) background. 

(a) For the material environment the correlations with the assessments of 
intelligence are (i) with the crude test marks 0-288 and (ii) with the adjusted 
assessments ():226. (b) For the cultural environment the correlations are (i) with 
the crude test-results, 0-453, and (ii) with the adjusted assessments, 0-315. 
These later coefficients agree quite well with the theoretical values we have 
deduced; and they thus provide an added confirmation for the validity of the 
analysis. 

Causal Influences and Path-Coefficients. Finally, we may note that the square 
roots of the two variances, represented by the lines OH and OS, may be written 
Bene and Bia» and can be regarded as the partial regression coefficients of the 
actual assessments (i.e. in the diagram, of the test results) on heredity and environ- 
ment respectively. Sewall Wright terms such quantities * path-coefficients i 
these two, it should be observed, are * one-way ? path-coefficients. 'The corre- 
lation between heredity and environment, Tne, he would also regard ase. ‘ path 
coefficient : but it is essentially a ‘two-way ' coefficient. With this inter- 
pretation we can think of heredity as influencing (or as capable of influencing) the 
test-results along two paths simultaneously : first, by the direct path represented 
by the regression coefficient fimo and secondly by an indirect path, leading first 
from heredity to environment (as indicated by the presence of a significant 
correlation between them), and thence by the path leading from environment to 
the test-results as indicated by the other regression coefficient, Ber An 


104 Cyril Burt and Margaret Howard 


analogous interpretation may be adopted, at least as a theoretical possibility, a 
regard to environment; environment may influence the test-results both directly 
and indirectly, the indirect influence passing first through the same two-way path 
as before, but in the opposite direction. 

However, when we attempt to examine causal influences in this concrete 
way, it seems scarcely satisfactory to stop at this point. There are two obvious 
questions which still remain undecided. Both are questions which statistical 
analysis by itself is utterly incompetent to solve. 

In the first place, as we have noted, environment may in some degree operate, 
or tend to operate, in the same direction as heredity. Suppose we could show 
that this effect occurred merely among those families where the parents, in virtue 
of their own inheritable intelligence, had been able to provide a better cultural 
environment for their children; many would be tempted to argue that, instead of 
crediting this effect to environment, we ought to regard it as yet another instance 
in which heredity is the primary cause. Whether such a statement is reasonable 
would depend on how the initial question itself was phrased; and no doubt, in 
the sweeping form we have just used, the conjecture would almost certainly be 
misleading : sometimes parents of no great intelligence do succeed, either by 
making financial sacrifices or (in times past) by the aid of inherited wealth, 
in providing exceptional intellectual opportunities for their children; these 
opportunities would almost inevitably increase the apparent intelligence of their 
children (though perhaps not so much as is commonly believed). ‘This is not the 
place to discuss such issues in and for themselves. We mention them to show 
that they are of a type which can only be decided by specific investigations, and 
that in such investi y merely an auxiliary part. 

Secondly, ondents indicate, many feel 
that it would called "the effects of the 


In any case the results recorded in the last of 
reasonably adequate answer to the limited qu 
these complex supplementary issues must be 


* Heilman, J. D., * The Relative Influence u 
and Environmental F: actors’, X XVITth Yearboo, 


‘ Factors determining Achievement and Grade Location ’ 
pp. 435-456. : 


pon Educational Achievement of Some Hereditary 
Nat. Soc. Study Educ. (1928), pp. 35-65; cf. id., 


J. Genet. Psychol., XXXVI, 1929, 


Vol. X The British Journal of Statistical Psychology November 
Part II 1957 


THE THEORY AND USE OF THE SUCCESS-RATIO 


By Davip A. WALKER 
Deputy Director of Education, Fife 


Abstract. The success-function defined in this paper has been found useful 
in follow-up studies of school children where entrance marks have been normally 
distributed. It provides a rapid method of estimating the product-moment 
correlation between entrance mark and success when the criterion for success is of. 
the pass-or-fail type. 


I. INTRODUCTION 

No system of selecting pupils for courses of secondary education is, or can be, 
completely successful while children preserve their right to develop in their own 
ways and at their own speeds. But in predicting success in the secondary school 
different batteries of tests show different degrees of efficiency; and much of the 
research in this field has consisted of follow-up studies undertaken to compare the 
validities of tests and other measures used at the selection stage. The concept of 
success-ratio is one that does not appear to have been used toits full extent; and 
the present paper is an attempt to develop its theory and application. 

The assumptions that must be satisfied if a full use isto be made of itare : 
(i) that the entrance marks of the whole group under consideration are normally 
distributed; (ii) that success (or its lack) is a normally distributed variable, those 
pupils above a certain level in that attribute being successful, while those below 
that level are unsuccessful; (iii) that the correlation between entrance mark and 
success is linear. 

The applicability of the first assumption to any set of data can readily be 
tested, while the second and third assumptions can be checked by the methods 
described in this paper. It will be observed that the question of correction for 
attenuation is not raised. The method thus avoids the difficulties both of suitable 
assumptions and of additional calculation experienced by previous experimenters 
(e.g. W. G. Emmett [1], W. McClelland [2]). 


II. THEORY OF THE Success-RaTIo 


If the assumptions just stated are satisfied, it is possible to calculate the 
expected value of the success-ratio for any given entrance mark. Let represent 
the entrance mark, measured in units of standard deviation from zero mean. Let 
yrepresent the success-failure variable, similarly measured. Lety = k be the line 
dividing success from failure, values of y greater than k representing success. 
The correlation surface is 


1 a " 
Z= inf?) exp - X- (x? — 2rxy e) 


106 David A. Walker 
where r is the correlation between the variables x and y. For a given value of x, 
the frequency of success is therefore 
o 1 =o 1 5 " 3 
E Th SP) — 55—7m (x*—-2rxy 4 y?) P dy, 
J,z4- 20-55], e| Maa (^ - 2r) d 
which can be shown to equal 


exp(—3«) p° E 1 T, 
2r (1-7), 9PL- xis 6-9] ay. 
Now exp( —$32)/4/(2v) is the total number of candidates with mark x in 
the group. Hence 
Number successful at level x 1 ii 1 3 4 
- — am (y — 1x)? dy 
Total number at level © ~ J/(2z)J/(1 73) js expl z1-5:50 779 
This fraction may be termed the ‘ success- 
y-rx 
Let VOA t dy = —y/(1—7?) du. 
Limits of integration are 


ratio’ at x. 


La Roig wd : _ m-h d ; 
= Fi m oo, ee TD o — e. 


Hence the success-ratio at level # 


i -0 
E: Vavi = Í — (-vü —r)jexp( — 1u?) du 


v0 =r) 


re—k 
1 Jar) 
ES exp(— t?) du. (1) 
Va exe be) 
It is therefore the integral of a normally distributed variable with zero mean and 
unit variance between the limits — co and (rx—k)A/(1—7?).. In other words, it is 
a function of r, x and k only. Consider this as a function of x. Put 


ru —k d r " 
VUTA’ us aay e 
Limits of v' are now 


— 9o and x, and Success-ratio at level x 


a r = (ru! —k)p 2 
Soe eS 3 4 
- V3 |. eef- Tra} av, » 
which is the e al of a normally distributed variable u’ with mean at 
(ru -Bivü-r )- 0, Le. w = kjr, and Standard deviation o' = v - yin 
both measured in c units from mean of ellipse. 
This function with mean at k/r and standard deviation equal to 4/(1 —1?)/r we 
shall term the ‘success-function ’. The theorem established is that the success- 
ratio at any extrance mark, x, standardized on the whole group is the integral of the 
success-function from — o; to x. E 


a 


The Theory and Use of the Success-Ratio 107 


III. APPLICATIONS 


(a) The rapid estimation of sucess-ratios. This application is most readily 
illustrated by an example. Suppose that the entrance marks are I.Q.s centred on 
100 with standard deviation 15, and that the correlation between I.Q. and success 
is 0-8; suppose also that the line dividing success from failure is at & — 1 S.D., 
i.e. 16 per cent succeed. 

Then the success-function is the probability function with mean at 
kjr = 1/0-8 = 1-25 corresponding to an I.Q. of 100-- 1-25 x 15 or 119, and standard 
deviation of 4/(1—7?)/r = 0-75 corresponding to an I.Q. range of 0-75 x 15, i.e. 11. 

From this distribution the theoretical success-ratio at any precise I.Q. level 
can be estimated. For example, pupils with I.Q. levels 119 have a success ratio 
of 0-50; those with I.Q. 130, i.e. 11 points above this mean, have a success-ratio of 
0-84; those with I.Q. 108 have a success-ratio of 0-16, and soon. These values may 
be read from tables of the normal distribution, or from a straight line drawn on 
probability paper. 


TABLE I. Success-Ratios or 1949 Group AT UNIVERSITY ENTRANCE LEVEL 


Number gaining 


Entrance Number ein Success- 
Mark in Group oo Ratio 

100 and 
over 8 5 0-625 
95-99 29 12 0-414 
90-94 105 27 0-257 
85-89 225 32 0-142 
80-84 377 10 0-026 
75-79 491 3 0-006 
70-74 514 2 0-004 
Below 70 2581 0 — 
Total 4330 91 = 


Mean Mark 65:3. Standard Deviation 14-3. 


(b) Estimation of correlation from success-ratios. The usual situation is that 
the success-ratios are known, but not the correlation between entrance mark and 
success; and the experimenter wishes to estimate the correlation from the data. 
The discussion will be clearer if an example is given from actual practice. In 1949, 
4,330 pupils in a Scottish county were transferred to courses of DER. 
education, their courses having been partly determined by the acne pen in n 
battery of tests and other measures ; five years later, 91 had obtained a es 
Leaving Certificate of a sufficiently high standard to admit eg a fae h 
University. ‘The performance of the whole group is shown :- le A ve 
space the lower part of the table has been telescoped; the distribution of entrance 
marks was in fact roughly, though not completely, normal. " 

Method A. From the theory given above it is obvious that an estimate o : s 
correlation coefficient r can be obtained from each success-ratio separately, 


108 David A. Walker 


provided that k is known. For example, the second group which has a mean 
entrance mark of 97 has a success-ratio of 0-414. The upper limit of (eqn. (1)) 
Lene (rs- E) /(1—59) = —0217. . @) 

Now x = (97 —65-3)/14-3 in standardized units; and is the line cutting off 
91/4330 from the tail of the normal curve, i.e. k = 2-034. (Inspection of Table I 
Suggests that no suitable candidate has been excluded from the successful group 
by the operation of the pass mark. This point will be considered in greater detail 
in a later section.) The solution of eqn. (3) (taking the positive value of the root 
since it represents a standard deviation) is r = 0-87. Estimates of r could be 
obtained in the same way from the other entrance levels, and these could 
be combined. The most accurate way of doing this is shown under Method C 
below. 

Method B. The most rapid method of estimating the correlation coefficient 
from a set of success-ratios is by the use of the relation between r and the standard 
deviation of the success function. Ifthe Success-ratios are plotted on probability 
paper against the mid-points of the entrance-mark intervals, the points should lie 
onastraight line if the assumptions underlying the method aresatisfied, From this 
straight line, the entrance mark for which the Success-ratio is 0-50 and that for 
which the success-ratio is 0-16 are read off. These give the mean of the success- 
function and the point one c unit below the mean. The difference between these 


success function. When it is 


; and aline of best fit may be 
le I, itis found thata success- 
and one of 0-16 at entrance mark 89:5. 


v - rjr = (98.9 — 89-5)/14-3, 
Alternatively, we may 
normal deviates; and the g 


If the assumptions are satisfied, the graph is o 


standard deviation of the Success-function corres 
deviate. 


Method C. Fora more exact method of estimatin 


from a set of Success-ratios we must fit the best line 
each set of observations the 


devised by Bliss (cf. Fisher a 


and Medical Research, Introduction to Tables IX and XI). 
with the success-ratios of pupils with diffe 

method not only enables us to calculate the slop 
deviate to entrance mark, and so to calculate 
Success-ratios, but also Supplies a value for chi- 
assumptions of normality and of linear correlatio; 


ratio of 0:50 occurs at 
Hence 


i.e. r = 0-86, 


g the correlation coefficient 


Tent entrance marks. Bliss's 
€ of the line relating the normal 
the correlation underlying the 
Squared indicating whether the 
n are in agreement with the data. 


3 
D 


j 


1 


The Theory and Use of the Success-Ratio 109 


Details of the calculations are given in the Appendix (pp. 46f below). When the 
method was applied to the data of Table I, the correlation coefficient obtained was 
0-85. The value of chi-squared was 5-8, giving a probability of at least 0-5 that 
the deviations could be attributed to sampling errors. 


IV. THE EFFECT OF THE Pass- MARK 
If entrance to the course is limited to those with entrance marks x > h, then 
the successful candidates will be those in the correlation ellipse for whom > h, 
y > k. The theory outlined in section II of this paper applies only to those for 
whom x > h, and modifications must be made in the methods of estimating r. 
Method A is no longer applicable, as & is not known. Methods B and C can still 
be used for all groups above the admission mark. : 
Let us apply Method C to the data of Table II which shows the number of 
pupils from the 1949 group who obtained a certificate of any type after six years of 
secondary education. ‘The admission mark to the course was 75, but some pupils 
with lower marks were admitted on appeal and others transferred to the Leaving 
Certificate course after having done wellin other courses. A dotted line is drawn 
across the table to separate these candidates from the others. 


Taste II. Success-Ratios or 1949 Group IN MINIMUM CERTIFICATE AFTER 6 YEARS 


Entrance ‘Total Number Number Success- 
Mark in Group Successful Ratio 
100 and over 8 8 1:000 
95-99 29 19 0:655 
90-94 105 56 0-533 
85-89 225 102 0:453 
80-84 377 91 0-241 
75-79 491 56 0-114 
"70-74 514 10 0-020 
65-69 543 2 0-004 
60-64 568 1 0-002 
Below 60 1470 0 0-000 


When Method C is applied to the figures above the dotted line, the value ofr 
obtained is 0-79 and the value of chi-squared is about 5:5, giving a probability of 
0-25 that the deviations are within the bounds of sampling errors. Let us now 
suppose that no candidate below the pass mark of 75 had been admitted to the 
course, ‘The numbers successful in the class intervals 70-74 and below would then 
have been zero in each case. The application of Method C would furnish a corre- 
lation coefficient of 0-86; but the value of chi-squared would become € and 
the consequent probability about 0-001. "The deviations would be too gr eh to be 
attributed to sampling errors and the assumptions, c.g. linearity of one E 
would be no longer tenable. No importance could be attched to the correlation. 


so obtained, 


110 David A. Walker 


It is interesting, though rather irrelevant to the main purpose of this article, 
to note that when the actual numbers of successes are used, the value of r obtained 
from the whole table is 0-84, and the value of chi-squared is 12-7; which corre- 
sponds to a probability of 0-20. The flexibility of the selection scheme has there- 
fore restored the position. The correlation obtained from the whole table is 
greater than that obtained from the part above the line, but it is not possible to 
state whether the difference is significant, as the standard errors of correlation 
coefficients obtained by these methods have not been worked out. 


V. Summary 


1. From a set of success-ratios there can be derived a success-function which 
can be used to predict other success-ratios and also to estimate the correlation 
between entrance mark and success, 


2. Methods of estimating the correlation between entrance mark and success 
are given and illustrated by examples. 


3. The advantages of the use of the success-function in estimating correlations 
are : 


(i) the assumptions are few, simple, and easily tested ; 
(ii) the intricacies of correcting correlation coefficients for attenuation are 
avoided, 


Points of the class intervals of the entrance 
x, chosen so that x = 0 at entrance mark 87. 
appropriate probit, i.e. the normal deviate 
increased by 5 to avoid negative values. These probits may be obtained from a 
table of the normal distribution i 


; aconvenient table is provided in Fisher and Yates’ 
y may be described as th 
in Column 5 of Table III. 


mental probits against the 
From this line we obtain 

uation. The provisional 
le values, occurring in Fisher and Yates’ Table XI. 
i I the working probit y. ‘This 
ratio times the range. For 


wey), S(wa?), and S(wy?), and from 


x = S(wx)/S(w) = —0-3408 ; 
j = S(wy)/S(w) = 3-593; 


= Sw(x—x)? = S(wx*) —XS(wx) = 537-549; 
B= Swy(x— x) = S(wxy) —xS(wy) = 305-708; 
c= Su(y—yp = S(wy*) -FS (wy) = 179-684, 


+ 


Pi A 


> 


^ Mer. 


The Theory and Use of the Success-Ratio 111 


Taste III. Dara ror DETERMINING b AND x 


Working 


Experi- T" ; 
Entrance Total Number Provisional 5 Weight 
mark x Number Successful: E En Probit aa i (ee) 

100-104 2 8 5 5:3 5:2 5:3169 5:016 
95-99 ` 2 29 12 48 47 4-7831 17-864 
90-94 1 105 27 43 42 4-3563 52-815 
85-89 0 225 32 3-9 3:7 3-9650 75:600 
80-84  —1 377 10 34 32 3-0809 67:860 
75-79 —2 491 3 2:5 2:7 2-5371 37:120 
70-74 —3 514 2 23 2:2 2-3698 12:644 
65-69 —4 543 0 — 1:7 1:4194 3-334 
60-64  —5 568 0 — 1-2 0-9522 0-670 


The slope of the line relating probit to entrance mark is b = B/A = 0-5687. 
Unit change in probit therefore corresponds to a change of 1/0-5687 in entrance 
mark. This must be standardized in terms of the class interval of 5 that has been 
used and of the standard deviation of the whole group of entrance marks, i.e. 14-3. 
The corresponding change of entrance marks is therefore 5/(0-5687 x 14-3) or 
0-6148 which thus represents the standard deviation of the success function. 
This, according to the theory, equals 4/(1—7?)/r, from which r = 0-85. 

Also y? = C—6B = 5 83. The appropriate number of degrees of freedom 
is 7, since two constants have been fitted, and the corresponding value of Pis 


between 0:5 and 0:6. 


REFERENCES 


[1] Emmerr, W. G. (1945). An Inquiry into the Prediction of Secondary School Success. 
London : University of London Press. 
[2] McCrzLLaxp, W. (1942). Selection for Secondary Education. London : University 


of London Press. 


112 


JOURNAL COMMITTEE 


As instructed by the Council of the British Psychological Society, the Journal is ago 
at its meetings in 1956, carefully re-examined the questions of title, typography, em 
price; and now wishes to thank the numerous contributors and readers of the journa 
who helped by communicating their views. In its final recommendations the Committee 
strongly supported the new arrangements for printing, and urged that there should be pA 
change either in the title or in the price of the journal. It suggested that the arguments 0. 
those critics who found the articles too technical might be partly met if contributors could 
be persuaded to submit papers on a wider range of topics, e.g. experimental, elementary 
or expository articles, and notes or abstracts from these: : 


Council deferred a full consideration of the Committee’s recommendations until its 
meeting on July 6, 1957, i.e. until after the first issue of the current year had been published. 
‘The decisions then reached were reported by the Editor to the Committee in his review 


for the first half year of 1957 (to July 31), which the Committee recommended should be 
published, 


The report stated that the changes in typography, which the Editor had proposed in 
1954 (11-pt Imprint with 1-pt leading, and abolition of small type) had now been sanctioned. 
Largely as a result, the previous deficit had been converted into a profit; and accordingly, 
after some discussion, Council had agreed that the price should at present not be raised. 


Council, however, feared that subscribers mi 
inadequate return for their subscriptions, since inc 
on the temporary reduction to two issues a year, would mean that less material could be 
presented in an issue of 64 pages. The Editor explained that it was now proposed to 
return to the plan of issuing three parts a year, and that a benefactor had guaranteed a 
fund to cover any loss that could not be met out of the profits. Part III would include 
an Index for the first ten volumes. The alternative would be to increase the number of 


pages in each part. The former Proposal was accepted by Council. But the Editor would 
still welcome the views of readers. 


ght now feel that they were getting an 
rease in type-size and spacing, following 


Since the Publication: 


es 1 s Committee had recommended that editors should serve for a 
limited period of years onl 


y, Council had nominated Dr. Whitfield as his successor. 

It was agreed that a formal expression of thanks should be recorded to the American 
benefactor for her aid, to contributors and readers for their continued help and support, 
and to all those (including the new printers and publishers) who had enabled the journal 
to reach a position where no serious deficit need any longer be feared. 

Correspondents are asked to note that 
Publishers of the Journal. Communication: 
sent to the Editor; questions relating to 
Managing Editor, 


the Officers of the Society are no longer the 
s regarding material for publication should be 
business matters should be addressed to the 


CHARLOTTE BANKS, 
Secretary- 


as 


ue - 


Vol. X The British Journal of Statistical Psychology November 
Part II 1957 


CRITICAL NOTICE 


Secondary School Selection : A British Psychological Society Inquiry. 
Edited by P. E. Vernon. London: Methuen & Co., Ltd., 1957. Pp. 216. 
15s. 


This survey will be widely welcomed for its summary of recent literature 
on the important issues raised : and the conclusions put forward may no doubt 
be taken as representing the general views of British psychologists on one of the 
most heated controversies of the day!. The book deserves the special attention 
of the statistical psychologist, not only because it shows how serviceable his 
techniques have proved, but still more because it sets forth a number of specific 
problems on which his help will be essential. 

The report is the joint production of a Working Party which consisted of 
about a dozen members of the British Psychological Society. With few excep- 
tions each contributed one or more chapters ; and, as was almost inevitable, the 
treatment varies appreciably in clarity and outlook. "Thanks no doubt to the 
care taken by the Editor and Secretary (Professor Vernon and Miss Doris Lee), 
the statistical issues are clearly brought out and the statistical evidence given full 
weight. 

The Preface warns the reader that “ the memorandum cannot represent the 
views of every member of the Society ; and no attempt is made to conceal that 
on some points differences of opinion or emphasis have arisen ". In certain 
cases a non-committal compromise appears to have been accepted. In others 
the contradictions are left unresolved ; and it is frankly admitted that 
“divergences of outlook remain". As several critics have already pointed out, it 
is perhaps to be regretted that the Working Party included none of the many 
experienced school teachers who are members of the Society nor any educational 
official or administrator whose memory could reach back to the practical 
difficulties which hampered the early reformers and which are so largely respons- 
ible for the shape the problem takes today. One consequence is that the issues 
involved are discussed as if they were primarily psychological, Whereas it is the 
administrative requirements that must determine any workable solution?. 

i i i ich the problem has recently aroused and of the volume of 
iad este wenger Menge e the Society’s inquiry was published, the Journal 
Committee suggested that the book should be reviewed at some length. The task has been entrusted 
to two writers who are fully acquainted with modern statistical methods and have also had first-hand 
experience of the practical issues either as teacher or as administrator. The Editor regrets that ms 
does not permit the publication of the letters in this issue : but the reviewers have been asked to ta : 
note of the points so raised. The full report recently published on the same subject by the Nationa 


Foundation for Educational Research was received too late for review at the same time ; but it will 
be discussed in the following number. The reviewers would welcome further correspondence, which 


will be published or summarized at the same time. . eres f . 
2 Miss Lee, who has been good enough to read this review in manuscript, points out that the 
" 


terms of reference received by the Working Party referred primarily to ‘ psychological aspects 
of secondary school selection * (Editor). 

H 
S.P, 


114 à Critical Notice 


The Inquiry starts with a discussion of the ‘ Background of Selection ’, and 
then passes to a ‘History and Survey of Present Procedures’. ‘These two 
chapters (as one of our correspondents observes) “ have apparently been drafted 
by writers who are more familiar, it would seem, with the recent findings of 
American psychologists than with the earlier efforts of English investigators ”. 
No reference is made to the work of such leading educationists as Ballard, 
Kimmins, Sir Robert Blair, Sir Percy Nunn, and Sir Michael Sadler, whose 
endeavours to place ‘ secondary school selection’ on a scientific basis were far 
more influential than the few who are mentioned by name. The bibliography 
of nearly 150 numbers provides a useful conspectus of recent publications. But 
only ten authors are cited who made contributions to the problem before 1930 ; 
and of these, six are American. 

The writer of the first chapter tells us that it was “ the Butler Education Act 
that gave us the blue print for the future of our schools which won unusually 
unanimous approbation "; and he apparently supposes that the proposals then 
made represented ** an attempt to catch up with the changes in the structure of 
society following the war ". In point of fact, there was hardly anything in the 
Act which had not been suggested twenty years before ; and much of the present 
difficulty is due to the fact that even among progressive educationists the atti- 
tude towards those suggestions has been far from unanimous. It would be 
truer to say that the ‘blue print’ was originally provided by the most cele- 
brated of all the Reports of the Consultative Committee of the Board of Education 
—the report on The Education of the Adolescent (1926). Yet, strangely enough, 
the list of references at the end of the book omits all mention of it. i 

: The inspiration behind that report came largely from Sir Percy Nunn, 
Principal of the Institute of Education, and by far the most eminent educationist 
of his day. : It was he who so effectively urged the substitution of statistical studies 
and statistical data for the impressionistic arguments that had previously pre- 
RM it was his papers and memoranda which had argued so persuasively for 
Eu o Used of LI as in acordan 
e Eds wh psychological research ” ; and it was he and his colleagues 

who pressed for the use of psychological tests as an aid to selection 

Ee Pod the abolition of the * Rule 3(b)' of the Board's Regulations for 
Pi dud Eu ees had for so long hindered their adoption. For many 
7 Pi ad urged' that the term "secondary" should be extended to cover all 
bee mit us amd a op onary on i 
“ schools with a realistic or practi FESSA, Should be introduced, namely, 

or practical trend to be known as ‘ Modern Schools 

(a name suggested as equivalent to the German term Realschul And it was 
his suggestion that, in order to be sure of catching in the net th 25 st members 
a each group, the proportion actually transferred to the E should 
e raised to between 15 and 20 per cent—the actual figure varying according to 


the population of the area. At the same time both Nunn and the Consultative 


VT. P. Nunn, Education: its Data and First Principles (1920), pp. 237f. 


Critical Notice 115 


Committee envisaged many other types of school or class according to the 
requirements and traditions of each district. 

We cannot therefore help feeling that the Working Party has represented 
the problem as much newer and far simpler than it really is. Moreover, in all its 
reports the Consultative Committee treated its various proposals as involving a 
process of allocating pupils to suitable schools rather than of selecting them for 
favoured types of school. Unfortunately, as the title of their ‘ Inquiry ' implies, 
the Working Party seem rather to have retained the popular notion of the pro- 
cedure, namely, that it is a kind of competitive selection for grammar schools. 
“At least half the child population,” they say, "is competing for grammar 
schools which will only accommodate one-fifth.” But that, plausible though it 
may seem to many, suggests a decidedly misleading picture of the fundamental 
problem. The real need at the moment is not so much more grammar schools 
for those who at present cannot get in, but more schools of a technical, commercial, 
or other specialized types. And, although at the end of the historical survey 
there is a short ‘ Note on Selection for Technical and Commercial Schools’ 
reminding the reader that “ this topic must not be omitted altogether ”, this 
aspect of the matter, and the numerous psychological questions which it involves, 
are in point of fact never really faced. 

Having outlined the general problem and described in somewhat vague 
terms ‘ the Diversity’ and ‘ the Adaptability of English Education ', the writer 
of the opening chapter goes on to ask why after all parents are so anxious to see 
their boys and girls accepted by a grammar school. His answer is that, when 
their child fails to win acceptance, parents from the middle classes “ feel that they 
are losing face ”, and as a result often prefer a relatively inefficient private school 
to the “ stigma of going to a modern school". Teachers who have enjoyed a 
wider first-hand knowledge of the reasons that actuate most parents apparently 
think this explanation a little unfair. Thus several of our correspondents are 
of the opinion that (to quote one writer) “ snobbery as a motive is far more 
frequent among the manual classes than among parents of the professional or 
middle classes: how often does one hear a father say: ‘I only let him go there 
to show that he’s as good as any other kid; I never meant him to stay ee As 
Mr. Skeet has said, owing perhaps to some lurking sense of inferiority, the 
working class parent ‘‘ does not so much want a suitable education for his child, 
but only that his child shall go to as good a school as their neighbour s, Le, to 
the one that is most esteemed’. On the other hand, middle class parents 
(if we may trust a survey recently carried out by Mr. R. T. Graham) tend as a 
rule to adopt a far less egoistic attitude: with them the commonest objection 
to the modern school is, “ not that it entails a social disgrace, but rather a fear 
lest the child may pick up rude manners, a low vocabulary, a slovenly accent, 
slangy and ungrammatical speech, and above all disgusting words and ideas ”. 
By going to a grammar school with a long-standing tradition, the boy will (so i. 
parent hopes) not only escape such risks, but “ acquire something of the cultural, 


1 D, V. Skeet, The Child of Eleven (University of London Press, 1957). 
H2 


116 Critical Notice 


moral, and social ideals, the spirit of loyalty and sense of responsibility and nes 
service that are part of the very atmosphere of the English public schools an 
- lished grammar schools ”’. " 

Bs ENS S of the psychological study of the problem the writers are 
evidently moreat home. Yet even here they seem to understimate the work done 
in this direction by the more progressive administrative authorities burg nel 
notably the inquiries and reports undertaken by the London County Counci 
and by the Board of Education itself. It is to these bodies and to certain out- 
standing members of their staffs—Young, the Secretary of the Consultative 
Committee, Daniel, the Chief Examiner to the London County Council, 
Kimmins, its Chief Inspector, and Nunn, the Principal of its Training College 
(later known as the Institute of Education)—far more than to the academic 
writers whose names are better known, that the real initiative was chiefly due. 

As the Preface points out, the contribution of the psychologist as such starts 
with " the pioneering work of Galton, who was interested in discovering innate 
abilities of a high order as well as in assessing cases of subnormality ". But the 
systematic use of intelligence tests as part of the educational routine began with 
the London County Council—the first education authority to appoint a psycho- 
logist as an Officer on its educational staff. The duties of the psychologist were 
not merely to apply Psychological methods, but to investigate their practical 
value and, where necessary, develop and recommend new procedures. 

For the assessment of subnormal children, the most obvious procedure was 
to use ' individual ' tests, such as those devised by Galton and Binet. For the 
scholarship examinations, intended to select brighter children for central schools 


and grammar schools, the application of individual tests (the only type available 
in the early days) would have been far too expensive; and accordingly a scheme 
of written tests for the majority, checked by teachers’ ratings, with oral exami- 
nations for borderline cases, was put forward for trial. The idea of employing 
* group ’ tests started with th 

out by Burt in 1909; and he Was apparently responsible for most of the stock 
types still used in exami 

Working Party’s 
Number Series, sh 
plaints of coachin 


report Opposites, Analogies, Classification, Reasoning, 
In view of recent com- 
, it is surprising that so few fresh 
forty years. Here, we feel, the 


Spearman urged the substitution of “ simple labora- 
tory tests, as befits the rig 


our of scientific work ” (e.g. tests of sensory discrimi- 


nation individually applied); Thomson questioned the whole notion of a general 
factor of intelligence, ersies over the very existence 


!Of. Experimental Tests of Higher Mental p í i Intelligence, 
3 Bip. Pidag. 1, 1911. s: Ba. al Processes and their Relation to 


I 


V 


N 


| 
* 


| 


Critical Notice 117 


of * general ability ' and of ‘ special aptitudes ’ served only to augment the natural 
scepticism of officials and committees!. Nevertheless, the experiments carried 
out by the London County Council aroused widespread interest among admini- 
strators; and, as soon as the first world war was over, the group tests standardized 
by Burt for London were tried by education authorities in Blackpool, Leicester, 
Northumberland, the West Riding of Yorkshire, and elsewhere, as well as for 
entrance examinations at Cheltenham Grammar School and at Rugby. All who 
used them pronounced favourably on the results. But the first authorities to 
introduce a separate intelligence test as part of their regular procedure were those 
whose areas included a mixture of both rural and urban schools: as one of the 
reports explained, "in country districts many pupils may be quite as well 
endowed in natural ability as those in the towns; but, for various reasons, notably 
the low standards often prevailing in the village schools, they are apt to reach a 
comparatively poor level in ordinary examinations of a scholastic type ”. 

In 1920 the Consultative Committee took up the problem at the request of 
the Board. As regards the theoretical issue the Committee unhesitatingly 
accepted the distinction between ‘ inborn ability ' and ‘ acquired attainments’, 
and the double assumption of an ‘innate general ability’ and * innate special 
aptitudes maturing rather late’; and concluded that " a number of children 
qualified to profit by secondary education are at present excluded because, 
for one reason or another, their attainments are not equal to their abilities. .. . . 
Examinations for entrance to secondary schools should be designed primarily 
to discover ability rather than to test attainments ”. Accordingly “ tests of 
‘intelligence ' " were recommended for adoption, in addition to other methods, 
on the ground that they provided “a species of written or oral examination 


designed to test ability rather than attainments ”’. : 
It is evident, however, that the Committee were not entirely satisfied with 


the use of the word ‘ intelligence’ to describe the tests used for this purpose. 
The reasons are perhaps most clearly set forth in the memorandum by Burt 
referred to in their report. A distinction is there drawn between the various 
purposes for which so called intelligence tests are used: in examining the border- 
line defective it is desirable, before certifying the child, to ascertain, so far as 
practicable, whether his backwardness is the irremediable consequence of an 
innate lack of general ability or merely the incidental result of remediable causes; 
but in examining a pupil for admission to a secondary school, it is less important 
to assess his innate ability as such, since that is but one of many factors Lees 
mining his capacity to profit by the education there given. For instance, the 
verbal and scholastic bias of many tests, which should be wholly eliminated in 
tests for the defective, may actually prove an advantage in selecting pupils for a 


i i i tion.... It is, therefore, 
ly bookish and academic type of educa : : 
est is used to describe two different 


hat the phrase ‘ intelligence test ’ 
E Pr having rs different functions. The teacher in m poe 
school, when receiving new pupils at the age of 7 or 8, rightly Seeks a me 6 9 
! See Board's report on Psychological Tests of Educable Capacity OS tS d 
which was largely concerned with these underlying igsues: PON y he ime p 
printed, both Thomson and Spearman had begun to modify their earlier views. 


118 Critical Notice 


testing innate ability, since at that stage what he wants to know are the inborn 
potentialities of each child; but that is no longer the primary purpose of scholar- 
ship examinations at the age of eleven or later." Here “ the real criterion 1s, 
not, how accurately does this procedure enable us to assess inborn potentiality, 
but how accurately does it enable us to predict the child’s capacity to benefit by 
the type of education proposed ". The Working Party state that the situation 
now obtaining in Britain “ clearly calls for some estimate of the child’s poten- 
tialities at the end of the primary stage ". We should prefer to see this donc at 
the beginning. And for the test used to estimate the child’s suitability for the 
grammar or modern school we should prefer some such name as ‘ general classt- 
fication test ’. 

The equivocality of the term ‘intelligence’ is responsible for much con- 
fusion. As the Working Party go on to remind us, “ intelligence suggests some- 
thing inherited; thus the parent is much more emotionally involved in the 
child's I.Q. than in his performance at attainment tests or ordinary examinations”. 
Accordingly one of their chief aims has apparently been to allay these parental 
fears, which are the source of so much ill-informed opposition. They seem to 
have thought that the best way to do so would be to minimize the importance 0 
hereditary differences and the possibility of determining them. But after all 
it is the existence of these innate differences that causes the problem and makes it 
of such national importance; and surely one can deny the need to test a thing for 
practical purposes without denying its existence on theoretical grounds. 

1 The chapter which deals with the actual use of so called ‘ intelligence tests’ 
is much more lucidly written than the rest. The author frankly accepts the 
results of factor analysis, but rejects the extreme and antagonistic views of both 
Spearman and ‘Thurstone. “Though factor analysis does not enable us to pin 
down the nature of intelligence," he says, “ it has nevertheless developed to à 
stage where it can effectively analyse and classify the component abilities .. - - 
The factor g (for general ability) is,” we are told, “ an empirical fact: as Burt 


pointed out, it is the Highest Common Factor that can be extracted from analy- 
sing any set of test scores”; 


; but at the same time “ he provided evidence for 
numerous group factors or subtypes of ability." In spite of Spearman’s argu- 
ments therefore we must recognize both. 
The writer then goes on to inquire “ 
inborn ability ?" This, however, 


how far do intelligence tests measure 
issue; and in any case the answer 


as we have already implied, is really a side- 


must surely be that all depends on which test 


we have in mind and on the way it is applied. On the general problem several 
extensive investigations have recently 


x been published in the Society's own journals 
Ny x odes IX, p. iei and refs.) to which the Working Party does not 
refer. It would appear that with a good test of i about 
77 per cent of the variance is Pod See uU a 


t able to genetic factors; and with supple- 
mentary checks even this figure can be raised. For evidence on this issue, how- 
ever, the Inquiry relies mainly on earlier American research. In the end the 
views of “‘ left-wing writers " who hold that * 


‘intelligence test results largely or 


A 


Critical Notice 119 


mainly reflect social class differences " are firmly repudiated; and Lawrence's 
studies of children transferred to orphanages are quoted as “ proving beyond 
doubt that such social class differences [as the tests reveal] are in some measure 
genuinely innate ”. 

So far, therefore, the conclusions seem to agree both with those of the 
Consultative Committee and of the ordinary teacher!. But, as we read on, we 
encounter those unreconciled “ divergences in outlook " that we are warned to 
expect. The Preface, as we have seen, urges the need for estimating the child's 
innate potentialities, and describes how education authorities are now using 
“tests designed to give an indication of innate ability as distinct from scholastic 
performance". Yet, on a later page we are assured that such tests “ do not 
measure innate ability ". Again in another passage, drafted presumably by 
another hand, we are told that "by intelligence we mean the more general 
qualities of comprehending, reasoning, and judging, which have been picked up 
without much specific instruction "; and it is therefore argued that, if I.Q. 
tests give a fair indication of capacity for advanced education ", that is “ not 
because the capacity is inborn, but because it has been more or less stabilized in 
the previous eleven years”. It is in our view perfectly true to say that what we 
want to test in such examinations is ' stabilized capacity for further education '; 
but to accept this view does not mean that we must first deny genetic differences 
or the possibility of assessing them. 

The mention of ‘ LQ. tests’ leads to a discussion of the time-honoured 
question of the constancy of the 1.Q. “ From about 6 to 11”, it is said, “ the 
I.Q. from a good individual test is unlikely to alter by more than about 7 points 
on the average "; (that should reassure teachers who wish to determine at an 
early age which children are innately handicapped and which have the poten- 
tialities of good scholars). Hence “ one can definitely conclude that the I.Q. is 
sufficiently stable to make possible useful predictions in the majority of cases 
over the period of primary or secondary schooling”. But once again, having 
thus confirmed the general view, the writers then proceed to make such large 
concessions to the rival doctrine that the ordinary reader is bound to feel 
bewildered. In certain cases, it is said, the I.Q. “ may change 30 or 40 points; ... 
Q.s in the 50 to 70 range may show themselves capable 
the ordinary school". Let us apply this to 
ta pupil who then had a mental age of 8 might 
at 12 have a mental age of over 14, and that " many " who had mental ages 
between 5 and 7 could later be satisfactorily educated in the ordinary school. 
Surely in such cases the more reasonable inference would be that the I.Q.s were 


obtained either by an incompetent tester or with unsuitable tests. 


many children with I. 
later of coping with education in 
children tested at 10: it means tha 


estionnaire to a number of teachers, asking 


hings) how they understood the term ‘ intelligence’: the definition “inborn all 
? (or its equivalent) was accepted by 84 per cent and rejected by only 9 per 
(The Bearings of Recent Advances in Psychology on 


1 A few years ago Professor Vernon submitted a qui 


(among other t| 
round mental efficiency x 
cent, the rest declining to commit themselves : 


Educational Problems, 1955). 


120 Critical Notice 


But all this is really irrelevant. Whether or not group tests of the usual 
type are of practical help in the selection examination is an issue to be decided, 
not by inquiring how far “ the I.Q., properly measured, is stable or not ", but 
by ascertaining how far the tests actually do predict the children's subsequent 
performances. And to the statistical reader the portion of greatest interest will 
be the chapter dealing with “ validity "—a problem which for some reason 18 
discussed before the description of the tests and other procedures whose validity 
is to be assessed. For assessing it the writer advocates the use of multivariate 
analysis and the discriminant function; but he does not explain how thesc 
methods would be applied or quote their results. ‘The substance of his review 
seems excellent; but his style is by no means lucid. If we interpret him 
correctly, he holds that two main problems call for special study. First, how far 
does the present system “ select or reject the right pupils "—or, as he puts it 
later, “ to what extent do the predicting tests place pupils in the right order of 
their potentiality to succeed?” Secondly, what should be “ the size of the 


grammer school group "—or, in other words, “ does the system take too many 
or too few?” 


But there is, as he points out, a prior question to be settled : how are we to 
judge the success? The verdicts of the secondary school teacher are, he 
believes, not sufficiently trustworthy. Nor is he satisfied with an external 
examination like the * G.C.E.". The best criterion, he thinks, would be an 
internal examination conducted by the school. Personally we are tempted to 
argue that profiting by what is intended to be a liberal type of education should be 
judged, not by any sort of examination, but on a genuinely liberal basis. How- 
ever, the evidence actually available rests, not on a single criterion, but on “ the 
cumulative effect of numerous investigations ” ; (the more important results are 


tabulated ina useful appendix); and the outcome “ greatly increases our con- 
fidence in the efficiency of the selection procedur 


à €". But at this point the 
Working Party seems to become unexpectedly optimistic. “ Throughout this 
report, says an inserted footnote, “ we have assumed that the correlation of an 
efficiently conducted selection pro 


niciently cedure with a perfectly reliable scholastic 
criterion is close to 0-90.” : d 


: ) This refers to “ validities for over 5 years". And 
the practical conclusion drawn is that “ littl 


, € gain in predictive efficiency can be 
expected from further tinkering ”1, i E 

, But is not the figure suggested a little too high? Since mental growth 
during adolescence is 


itself liable to unpredictable fluctuations and since a child's 


r s must be influenced by these and many other unforeseen 
accidents, there can b 


i : € no such thing as a ‘ perfectly reliable scholastic criterion’ 
in any literal sense. , Moreover, even the reliability coefficient of the selection 
procedure seldom rises as high as 0-90, ‘The validity coefficients which 
approach this figure have all been obtained by ‘ boosting’ the observed value to 

‘In the final chapter on ' Recommendations * 
expressed: "the intelligence test is so con: 


the essential conclusion is more carefully 
though it might be improved  . 


sistently successful that it should not be dispensed with, 


Critical Notice 121 


allow for selection. The observed correlations, we are told, are “ typically of the 
order of 0-45”. Consequently a theoretical value of 0-90 would virtually 
double the observed coefficient. 

However, having assumed a validity of 0-90 and a selection-ratio of 20 per 
cent, the writer infers that the number of pupils wrongly placed would be about 
10 per cent of the total; one quarter of those selected (5 per cent of the total) 
would not really deserve their places, while another 5 per cent ( 1 in 16 of the 
failures) would presumably have made good in the grammar school had they been 
given the chance. Now this method of assessing the results implies a further 
assumption which seems scarcely tenable. Why must we suppose that the 
number of pupils who genuinely deserve places (as judged by the criterion) 
must be exactly equal to the number who are given the chance (as judged by the 


selection procedure) ? 
Before we can properly settle this point, we must take up the second of 


the problems mentioned above; and this in turn involves two distinct questions. 
First of all, how many should we actually transfer ? The Consultative 
Committee, it may be remembered, suggested an entry of about 20 per cent, with 
wide variations for different areas. The Working Party as a whole favours a 
| much higher proportion, namely, 25 to 30 per cent. On the other hand, the 
only educational adminstrator included among their members suggests a lower 
H : figure—15 per cent. The main principle governing the Working Party's argu- 
| ment is apparently an overriding anxiety lest injustice should be done to the less 
favoured individuals. But in our efforts to give the benefit of the doubt to every 
p questionable case, we may do serious injustice to the community : for we shall 
almost inevitably lower the standard of teaching and culture in the schools that 
receive these poorer specimens: and we shall be frittering away our limited 

financial resources on pupils who at best are rather dubious speculations!. 
But there is another question which we ought to ask : how many are there 
d in the total population who really can profit by this more costly type of education ? 
When a teaching institution in a University selects candidates for training with a 
view to a degree in medicine, let us say, it does not expect every one of those 
accepted to pass the final examination: the teaching course itself provides a 
period of probation. And the number who eventually pass will be much 
ää smaller than the number who were originally accepted. Similarly, in accepting 
candidates for a grammar school, there are two distinct numbers to be borne in 
mind : (i) we want, so far as we can, to ensure that the ablest members in the 
total population (the top 2, 5, or 10 per cent or whatever it may be) shall receive 


an education suited to their ability and aptitudes; but (ii) in order to do so with 
it will be necessary to admit as many as we can 


as few omissions as possible, 
eave the grammar school before 16, and 49 per cent at 16 ; and 
a rule, much greater among pupils from the manual classes and 
in areas which select a high proportion for entrance : cf. Central Advisory Council Merge 
^ Early Leaving, H.M. Stationery Office, 1954. The social and cultural dangers involved in the 
recent changes in educational organization are ably discussed in The Uses of Literacy (1956) by Richard 


Hoggart, himself a member of the working classes, who profited in this way. 


1 As many as 18 per cent now | 
the proportion of early leavers is, as 


122 Critical Notice 


manage, having due regard to the need (a) not to waste money on those who a 
bad bets, (6) not to lower the standards of the school, and (c) not to bra ^ 
undue strain on the less able pupils who are accepted. . It follows that the Y 
number of transfers should not be equal to, but appreciably larger than, the 
total number of successes as estimated for the entire population. ; , 

The disagreement over the number proposed for transfer by differen 
members of the Working Party seems to refl 


ect a basic weakness in the whole 
approach. Throughout there is a tendency to assess the practical value of : 
selection procedure (or of a component test) solely by reference to its seine 
validity. Now what an education committee wants to know is whether Tn 
merits of any new procedure will justify the expenditure that it will presumably 
entail. And this question can hardly be answered unless we can first attempt 
some fairly definite estimate for both costs and merits. 

But, it may be asked, how can this be done 
impressionistic arguments? A careful scrutin 
parallel examples and a variety of available d 
important, and yet so com 


to cite a concrete illustration. 


except in terms of mesi 
y of the literature reveals plenty o 
evices. However, the point is so 


Scales in Examinations for Free Places’, 
formula for determining * mathematical expe 
on higher algebra), viz., ‘x, = biG-qL 


€ 
, t 


q; their respective probabilities with this pr 


f components and their respective probabilities; 


inations involves interpolation, which 
ed, is based on that used b. 
ues of th of pleasure over pain: F. Y. Edgeworth, 
ethod followed wo; in some ways to be the reverse 
dures’, viz., minimizing the 
' minimizing ‘ risk 

n adopted in actual practice’. 


4 


w 


Critical Notice ; 123 


The conclusions reached afford an excellent instance of the fallacy of basing 
practical recommendations on theoretical validities alone. The procedure with 
the highest validity was one which included a separate intelligence test, as well as 
the usual papers in English and Arithmetic and the teachers' marks; yet it is 
shown that,.in a well-organized educational area like London, the additional 
information supplied by the intelligence test was too slight to warrant the addi- 
tional cost; while in mixed areas (partly rural and partly urban), it appeared, 
such a test might amply justify its cost. 

In Britain, where the funds at their disposal have been strictly limited, the 
pioneers of applied psychology, whether in education or in industry, have always 
paid special attention to questions of profit and cost. And today, even in 
America, arguments of this kind are now a commonplace. As one educationist 
has put it, “ in every investigation where practical issues are involved, statistical 
theory should insist that considerations of cost and of the consequences of the 
decisions should be taken into account "!. It is therefore surprising that this 
aspect of their problem is not touched upon in the Working Party’s ‘ Inquiry ’. 

A short chapter is devoted to the assessment of attainments in school subjects 
and to the supplementary techniques available for borderline cases, such as the 
study of children’s interests, the assessment of character qualities, and the 
value of interviews and oral examinations. An ‘ oral interview ’ (with individual 
tests) was advocated by the Consultative Committee as a useful adjunct to the 
ordinary examination; and a recent survey shows that approximately 70 per cent 
of the various County Councils now adopt some kind of interview for doubtful 
cases. This the Working Party vigorously condemns. But none of their 
reasons seems very convincing. 

First, we are told, an interview “ puts a strain on many children "; and 
case-studies are quoted to illustrate how a dread of the examination may induce 
But, as most school doctors would agree, nervous strain is far 
the dull but conscientious pupils after thay have been trans- 
ferted to the grammar school than it is among candidates for the examination. 
A tactful interview would do much to obviate such consequences. Secondly. 
* interviewers are liable to be impressed by the wrong things, such as deport- 
ment or accent”. But it surely should be part of the trained interviewer's 
technique not only to put the child at his ease, but also to guard against his own 
personal prejudices. Thirdly, “ wartime research confirmed psychologists 
doubts as to the consistency of the interviewers and the validity of their judg- 
ments". However, unlike teachers, most wartime psychologists were new to 
such tasks; and the attempt to judge the personality of adults is a problem quite 
different from assessing the ability and character of children of eleven. " 

Finally, it is said, “ whenever conclusions derived from interviewing chil i 
are allowed to outweigh evidence from test-scores and teachers bw © 
accuracy of prediction is likely to be lowered ". But the object of the interview 


nervous stress. 
commoner among 


(Rev. Educational Research, XXIV, 1954, pp. 448f) who gives a useful 


1M. A. Girshick r 
ires of this type. 


survey of modern statistical procedu 


124 Critical Notice 


is not to secure evidence which shall “ outweigh ” other estimates, but rather to 
check and supplement them, particularly where the results of the tests and the 
estimates of the teacher are ambiguous or at variance. At the same time, it affords 
a convenient occasion for collecting and coordinating further information about 
the child’s aptitudes, interests, behaviour and past history, for hearing the views 
of the parents, and above all for making sure that no accident of ill health or 
transitory disturbance was responsible for the low marks obtained for the 
written examination. Every grammar school master would prefer to see the 
borderline candidates himself before any decision is taken as to whether they 
are suitable for his school, while the head of the primary school would, if pr esent, 
feel reassured that no case was turned down without full consideration. 
Certainly the N.U.T. Committee of 1949 looked on the old fashioned interview 
with disfavour. But their objections scarcely apply to what is sometimes called 
the ‘new style of interview ’!, In short, the disadvantages alleged seem to be 
reasons, not for discarding the interview, but for improving it. 

The report concludes with thirty-two italicized recommendations. The 
majority are excellent in substance, though not always felicitously worded. At 
Present, of the local education authorities in England and Wales, barely one 

. quarter enlist the services of a psychologist. “ Psychologists,” says the report, 
“should uphold their claim to be able to diagnose the most suitable form of 
schooling.” But the ability to ‘diagnose’ a suitable school is primarily an 
educational rather than a psychological problem; and the majority of educational 
psychologists are more familiar with the diagnosis of dull, backward, and neurotic 
pupils than with the ‘ diagnosis’ of schools. What the psychologist can do, 
provided he possesses the requisite training and experience, is to help in the 
construction, standardization, and application of tests, and offer advice on indi- 
vidual children whose ability or character seems problematic. But much of this 
advice could and should be given long before the stage of selection is reached. 

With the more practical recommendations this is scarcely the place to deal. 
But the detailed suggestions as to the theoretical problems calling urgently for 
research may well be brought to the notice of our readers. The following aan 
the most important: “ the broader aspects of benefiting from a grammar-schoo 
course, and their predictability,” * the development of intellectual capacities 1n 
adolescence ”, “ « old-type’ and other forms of examination or assessment which, 
without loss in predictive efficiency or undue unreliability in scoring, might have 
beneficial rather than harmful backwash effects,” and finally “ the general 
scaling of School marks and estimates ” : (on this last problem there is a most 
instructive statistical note in one of the appendices) In all these inquiries, the 
Working Party rightly “ insists that the procedures should be planned and 
supervised by persons with adequate psychological and statistical training ”’. 

To the statistical psychologist perhaps the most interesting feature of the 
whole inquiry will be its incidental and unanimous vindication of statistical 

! See Mr. Skeet’s account of 


I the ! new style interview 
land authority in collaboration v 


t ' (as worked out, e.g. by the Northumber- 
with Professor "Thomson) 


+ The Child of Eleven, pp. 131-135. 


Critical Notice 125 


procedures as the most objective and profitable means available for studying 
complex practical issues like that of selection. At the same time both its merits 
and its shortcomings illustrate the supreme importance of combining the theo- 
retical approach of the statistical psychologist with the practical experience of the 
teacher and administrator. Many of the investigations cited have been carried 
out by research-students, unfamiliar with the conditions inthe classroom and 
the home, and often working under purely academic guidance. It therefore 
seems high time that a larger number of education committees should follow the 
example already set by one or two of the most progressive, and appoint experien- 
ced psychologists as members of their own administrative staffs with full authority 
to undertake systematic surveys and cooperative studies, scientifically planned, 


of the various problems arising within their areas. 


A. G. HOUGHTON 
D. R. Morris 


BOOK REVIEWS 


Statistical Methods and Scientific Inference. By R. A. FISHER. London: Oliver 


and Boyd, 1956. Pp. viii--175. 16s. 


is i i isti ist should read. The author explains 
This is a book which every statistical psychologist she 1 $ utl 
that in his two previous volumes he had in view two practical aims; in Statistical Methods 
for Research Worhers his object was to modify traditional statistical procedures (based on 
the theory of infinitely large samples) in order to meet the nd a experten An 
i i cperii im i uality of the data 

i enar d Tape in s ure of the procedures involved. In 


E . H " e the : 
de Ru enim pem had not permitted a full discussion of “ those refine- 


these earlier volumes, ich have come clearly into view with a study of these two 


i roci zem à 
— ie ge worl a h these further refinements that Sir Ronald Fisher's 


new book is mainly concerned. k 
biological concepts are somewhat severely criticized. 


i i ù s misleading (e.g. in problems of 
i re described as clumsy and sometime slead ( if 
ae sit vin his dispute with r Mendelian dm USES Wen 
» he the bull to the skilful matador » The newer theories of eines Ee ia p 
o nized a6 useful in the fileds of industrial and commercial psychology for = ic! ey 
mers originally designed ; but their adoption as aids to forming correct scient: c co! 

; 


sions is firmly rejected, 


Karl Pearson’s statistical and 


W. R. NORTON 


126 Book Reviews 


An Introduction to Cybernetics. By W. Ross Assy. London: Chapman and Hall, 
Ltd., 1956. Pp.x+295. 36s. 


Dr. Ashby’s aim has been to provide an introductory textbook on * cybernetics ps 
His definition of the term is “ the science of control and communication in the animal and 
the machine: in a word, ‘ Steersmanship ' ". As he explains, the present volume is 
complementary to his earlier work, Design for Brain. It falls into three main parts : 
the first discusses processes developing within a single system; the second, processes of 
communication between one system and another; and the last, the problem of regulation 
and control in biological systems. No knowledge of mathematics is required beyond 
elementary algebra; and each section is followed by graded exercises with hints and 


answers. 
Starting from a few elementary concepts— operands’ (i.e. States or properties of 
* systems ?), * operators’, ‘ transitions’ (ie. the changes effected by the operations), 


repeated transitions, and the like—the author shows how each can be made more precise 
by formal definition and Classification. A complex set of transitions, produced by operating 
simultaneously on a number of ‘ operands’ within the same system, is called a * transforma- 
tion’, and is represented symbolically by the multiplication of matrices, The series of 
states exhibited by such a System at successive times is expressed by the successive applica- 


tion of the appropriate transformation, and is said to define a ‘trajectory’ or ‘line of 
behaviour ’. 


enquiries, the first step 
e abstract properties of 
of the matrices and the matrix-operations. We are 
1 ption of a ‘ determinate machine’, "This is defined as ‘ that 
which behaves in the same way as a closed single-valued transformation’. The definition, 
it will be noted, turns not on the nature of a material structure, but on its mode of 
behaviour. The point is illustrated by what the electrical engineer would call ‘ the problem 
of the black box’ We are confronted with an apparatus possessing terminals for input 
to which he may apply any stimulation he ple2ses, and terminals for output, which will, 
enable him to observe the results : the box itself cannot be opened; and the problem is to 
or infer its inner Structure in the li j 


i In Part II we are introduced to a mechanism whose behaviour is stochastic, i.c. 
determinable only in a statistical Sense" ; and th 
the first part of the book are 


of such processes as coding, 


a number of Psychological problems He 
y sate MS points. A special characteristic of t7 
oe studied by the psychologist is that they seek goals. And goal-seeking D 
eA a s m i le around a state of equilibrium "5 
r " as he ; out, amples of stability so attained are ‘ goals’, 
ae not all of them are desirable ’, ire’ apparently implies 
' dfe ard deed a ase 
me Ponoi fa ‘value or ' satisfaction ^ Which, it would Seem, can never be reduced to 
P uy n anistic concepts, Generally, he Suggests, “it should be possible for the 
o oue ica ly minded Psychologist to devise a topological approach which will enable him 
E E: he mvormation he needs”, The o i interesting section on “ ampli- 
albus pecu t term). Problem-solving, whe 
can be amplified, « it 1 and, seinge power of EU 
" 5 A H i i e 
amplified ”, As an instance of such i Tee ade uen Larter nn. grow. 
up to form a brain ”, ur Rene 


Book Reviews 127 


Dr. Ashby's psychological illustrations have been somewhat severely criticized by 
psychologists themselves; and this has led many to suppose that his methods would be 
of little use in psychology. The present reviewer would dissent strongly from that verdict. 
-The proposals outlined in his book might perhaps have carried more conviction had he 
referred in fuller detail to researches already carried out by psychologists, where similar 
concepts and procedures have already been employed with success. Stout, for examples, 
over half a century ago asked “ whether we can find a mode of physical self-determination, 
which, like the apperceptive and conative processes studied by the psychologist, is directed 
towards an end, and ceases when the end is attained’’; and found it in the process of 
* recovering equilibrium after disturbance ' by means of ‘ adaptation to irregularly varying 
outer condition ' through ‘ indirect’ (or ‘ re-entrant’) ' self-determination’ (i.e. through 
feedback). The temporal sequence of states which results he terms a ‘ vital series’ (an 
inappropriate phase borrowed from Avenarius) (Analytic Psychology, I, pp. 151f.). 

Lewin attempted a ‘ topological’ approach (Principles of Topological Psychology, 
1936); but at that date the theory of analysis situs was not sufficiently advanced for much 
progress to be made. Burt’s treatment of the subject, familiar to most readers of this 
Journal, comes nearest to that envisaged by Ashby. In his terminology Ashby’s ‘ systems ’ 
become ‘ persons’, the ‘ operands’ ‘ traits’, the ‘ temporal series’ ‘ occasions ', and the 
‘ operators' ‘ factors’ (i.e. matrix multipliers containing * weights í or b loadings ). " If 
necessary, qualitative traits may be studied instead of metrical by substituting association 
coefficients for correlation coefficients. Following Stout, Burt regards the essential 
process to be the recovery of equilibrium after disturbance; but instead of a single stimulus 
or response we have patterns, expressed by vectors or matrices. The use of matrix models, 
taken over by Burt from Sheppard, is of long standing in psychology. ‘The application 
of the * principal axis transformation ' then enables the investigator to express the whole 
process in ‘ canonical’ form; and the use of O-technique, based on time, in place of the 
more familiar T- or P-techniques renders it possible to “ deal with dynamic instead > 
merely static problems”. Thus the processes of learning and growth are expresses y 
repeated multiplication by a factor matrix consisting of transitional UR i E ate 
in the simplest case, to a Markov chain (also suggested by Ashby); and what Ashby calls 
* traj iioi D rti "€. These detailed similarities between 

trajectories ’ are expressed by ‘ autocatalytic curves "*. ier at confirma- 
the procedures worked out by two independent investigators seem to otter strong 


tion of their value. PP 

One clon c criticism advanced by psychologists is that the problems attacked along 
these lines by statistical investigators (e.g. those of mental sphere learning e kie 
like) cannot be solved until the underlying aui ur (Rh ae UE ane "eem 

i ici ist. To this objection ; ‘ 
Se ie ue d oae Tey : after all one can discover a 
dling knobs and noting the 


studying his ‘ black box ' seems to a c uid 
great deal about the behaviour of a new boc ded A the diagram of the wiring. The 
hat it is urgently 


results without ever peering inside the case oi at th i 
h practical importance 
ai Aa et half a century for the 


sychological problems themselves. c \ npo! 
desleable do eek at least a provisional solution without waiting ar eh Eie 
internal mechanisms to be revealed in d iei was bena Aie apishe 
i the most S 
in these fields Dr. Ashby's book forms one o 
T. F. Ramsay 


during recent years. 


Cf. this Journal, 1 95f. 66f., I 79) X 49] f 

hU £., and IX, pp- f. For some 0} 
th ; is J pp ., III, pp. 166f., V, P- D ; 1 ve 
analogies Touman ea methods and cybernetic methods I am indebted to a lecture given 


by Professor Burt at Worcester. 


128 


INDEX TO VOLUME X 


INDEX OF AUTHORS 


PAGE 
Bernyer, G. : Psychological Factors: their Number, Nature, and Identification n 
Burt, C., and Howard, M. : Heredity and Intelligence : A Reply to Criticisms 
Burt, C., and Howard, M.: The Relative Influence of Heredity and Environment b 

on Assessments of Intelligence 

Dunsdon, M. I., and Roberts, J. A. F.: A Study of the Performance of 2,000 i 

Children of Four Vocabulary Tests ) 
Hotelling, H. : The Relations of the Newer Multivariate Statistical Methods to "a 
Factor Analysis UE 33 

Howard, M., and Burt, C. : Heredity and Intelligence : A Reply to Criticisms 

Howard, M., and Burt, C. : The Relative Influence of Heredity and Environment 39 
on Assessments of Intelligence 65 

Lewis, W. B.: Professor Lewis M. Terman . 

Roberts, J. A. F., and Dunsdon, M. I.: A Study of the Performance of 2,000 1 
Children on Four Vocabulary Tests 29 

Stuart, A. : The Comparison of Frequencies in Matched Samples 105 
Walker, D. A. : The Theory and Use of the Success-Ratio A E 
Wrigley; ©, F. : The Distinction between Common and Specific Variance in Factor a 

eory 


Lo d 


INDEX OF SUBJECTS 
Common and S 


pecific Variance in Factor Theory, The Distinction between : 
Wrigley, C. F. 


81 

Frequencies in Matched Samples, The Comparison of : Stuart, A. 29 
Heredity and Intelligence : A Reply to Criticisms : Burt, C., and Howard, M. 33 
Heredity and Environment on Assessments of Intelligence, The Relative Influence 

of: Burt, C., and Howard, M. 99 
New Multivariate Statistical Methods to Factor Analysis, The Relations of the : 9 

Hotelling, H. e 
Psychological Factors : their Number, Nature, and Identification : Bernyer, G. 05 
Success-Ratio, The Theory and Use of the : Walker, D. A. 1 
Terman, Professor Lewis M. : Lewis, W. B. i 
Vocabulary Tests, A Study of the Performance of 2,000 Children on Four ; Dunsdon, 1 

M. I., and Roberts, J. A. F, 

BOOKS REVIEWED 
Ashby, W. Ross: An Introduction to Cybernetics E^ 
Tognbach, En Ji and Gleser, G. C. : Psychological Tests and Personnel Decisions Ee 
Hu ps E dpe tical Methods and Scientific Inference a 
D. A.: e Orga. 7 

Van EE (d). ‘ganization of the Cerebral Cortex 


Secondary School Selection He 


* MEM XM d ic cci LARA 


» 


“GRIFFIN’S STATISTICAL MONOGRAPHS & COURSES" 


Edited by M. G. KENDALL, Sc.D. - 


A new series of publications planned to meet the need for a 
form of book, at moderate cost, which will make accessible, 
to a group of readers, specialized studies in statistics or 
special courses on particular statistical topics. 


No. I. ANALYSIS OF MULTIPLE TIME-SERIES I 

1 M. H. QUENOUILLE 245. net | 
] No.2. A COURSE IN MULTIVARIATE ANALYSIS 

M. G. KENDALL 22s. net | 


y No. 3. THE FUNDAMENTALS OF STATISTICAL REASONING | 
M. H. QUENOUILLE » In the press 


THE MATHEMATICAL THEORY OF EPIDEMICS 


By NORMAN T. J. BAILEY, M.A. 
Reader in Biometry, University of Oxford. | 


A valuable survey, with the main emphasis on the biometrical 
and epidemiological rather than the mathematical aspects. 


p—— uid 


General introduction—Historical sketch of mathematical epidemiology— 
Epidemiological ^ principles—Deterministic theory—Stochastic theory: 
continuous-infection models—Stochastic theory: chain-binomial models— 
The measurement of latent and infectious periods—Recurrent epidemics and 
endemicity—The detection of infectiousness—Summary and concluslons— 
Index. s. net 


CHARLES GRIFFIN & COMPANY LIMITED 
42 DRURY LANE LONDON Wc2 


OCCUPATIONAL PSYCHOLOGY d 


Editor : ALEC RODGER 


October, 1957 Volume 31, No.4 | 
ober, 


Engineering Psychology : A British Psychological Society Symposium : i M E? 
RANK A. GE 
Introductory Remarks . 
Information Theory in the Understanding of Skills . m ow. i 
The Objective Study of Judgment and Decision-Taking an 23 
Discussion of the Preceding Papers 


4 i i ti W. T. SiNGLETON and R. SIMISTER 
The Design and Layout of Machinery for Industrial Operatives a Boucle Sea 


Transfer of Training in Engineering Skills D. C. FRASER | 
Environmental Stress and Its Effects on Performance B. UNGERSON l 
Discussion of the Later Papers JOHN CROWTHER | 


i ivisi 1950-1955 
in the Divisions of One Company, 
Tenis of OR Size and Stability upon the Effectiveness of an Incentive Payment Systém — LestrE Buck | 


An Outline of the History and Work of the Medical Research Council's Industrial Psychology Research Group. 
I 


By R. MARRIOTT. 
Book Reviews 
Other Books Received 
Index 


Annual Subscription 30 shillings 


i t, London, W.1 
National Institute of Industrial Psychology, 14 Welbeck Street, i | 


BRITISH JOURNAL OF STATISTICAL PSYCHOLOGY 


Page = 


Nol X Part IT November 1957 
i CONTENTS 
* 9 
Lewis, W. B. Professor Lewis M. Terman o , . ü à 65 
HorzLLING, H. The Relations of the Newer Multivariate Statistical 
> Methods to Factor Analysis. : : 69 
Wrictey, C. F. The Distinction between Common and Specific 
Variance in Factor Theory . : : 3 81 
i % 
Burt, C., AND The Relative Influence of Heredity and Environment 
Howard, M. on Assessments of Intelligence : s o D 99 
WALKER, D. A. The Theory and Use of the Success-Ratio : 329105 
JOURNAL COMMITTEE ar Were : - $ : : egg T. ^. 1125 
HoucHTON, A. G., AND ' Secondary School Selection * (Critical Notice) : x # 113 
Morris, D. R. iá 4 
Book Reviews c . B : : E . 126 
INDEX 


As á t 


P io END 


JOURNAL COMMITTEE 
CYRIL Burt Editor / i 3 
CHARLOTTE BANKS and ALAN Stuart Assistant Editors Se 


J. W. WnrrEIELD. Managing Sub-editor 


* 
$ 
" and 
B. M. Foss J. S. SMALL 
M. Hamivron J. SUTHERLAND 
A. HERON 


F. W. WARBURTON 


ma 
aa 


M 
M ov 


* 
" , 
ny’ y 
4» j s 
N NE j eM s t 
E a2 E 
1 | Aesch 4 
MS we 
elec heart uL iU y 


tse see 


MADE BY THE DISABLED 
REHABILITATION -INDIA 
CALCUT TA—700088 


